All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Per client engine busyness
@ 2021-05-13 10:59 ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Resurrect of the previosuly merged per client engine busyness patches. In a
nutshell it enables intel_gpu_top to be more top(1) like useful and show not
only physical GPU engine usage but per process view as well.

Example screen capture:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s

      IMC reads:     4414 MiB/s
     IMC writes:     3805 MiB/s

          ENGINE      BUSY                                      MI_SEMA MI_WAIT
     Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
       Blitter/0    0.00% |                                   |      0%      0%
         Video/0    0.00% |                                   |      0%      0%
  VideoEnhance/0    0.00% |                                   |      0%      0%

  PID            NAME  Render/3D      Blitter        Video      VideoEnhance
 2733       neverball |██████▌     ||            ||            ||            |
 2047            Xorg |███▊        ||            ||            ||            |
 2737        glxgears |█▍          ||            ||            ||            |
 2128           xfwm4 |            ||            ||            ||            |
 2047            Xorg |            ||            ||            ||            |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Internally we track time spent on engines for each struct intel_context, both
for current and past contexts belonging to each open DRM file.

This can serve as a building block for several features from the wanted list:
smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
wanted by some customers, setrlimit(2) like controls, cgroups controller,
dynamic SSEU tuning, ...

To enable userspace access to the tracked data, we expose time spent on GPU per
client and per engine class in sysfs with a hierarchy like the below:

	# cd /sys/class/drm/card0/clients/
	# tree
	.
	├── 7
	│   ├── busy
	│   │   ├── 0
	│   │   ├── 1
	│   │   ├── 2
	│   │   └── 3
	│   ├── name
	│   └── pid
	├── 8
	│   ├── busy
	│   │   ├── 0
	│   │   ├── 1
	│   │   ├── 2
	│   │   └── 3
	│   ├── name
	│   └── pid
	└── 9
	    ├── busy
	    │   ├── 0
	    │   ├── 1
	    │   ├── 2
	    │   └── 3
	    ├── name
	    └── pid

Files in 'busy' directories are numbered using the engine class ABI values and
they contain accumulated nanoseconds each client spent on engines of a
respective class.

Tvrtko Ursulin (7):
  drm/i915: Expose list of clients in sysfs
  drm/i915: Update client name on context create
  drm/i915: Make GEM contexts track DRM clients
  drm/i915: Track runtime spent in closed and unreachable GEM contexts
  drm/i915: Track all user contexts per client
  drm/i915: Track context current active time
  drm/i915: Expose per-engine client busyness

 drivers/gpu/drm/i915/Makefile                 |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
 drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
 .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
 drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
 drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
 drivers/gpu/drm/i915/i915_drv.c               |   6 +
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 drivers/gpu/drm/i915/i915_gem.c               |  21 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
 drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
 19 files changed, 716 insertions(+), 81 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-13 10:59 ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Resurrect of the previosuly merged per client engine busyness patches. In a
nutshell it enables intel_gpu_top to be more top(1) like useful and show not
only physical GPU engine usage but per process view as well.

Example screen capture:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s

      IMC reads:     4414 MiB/s
     IMC writes:     3805 MiB/s

          ENGINE      BUSY                                      MI_SEMA MI_WAIT
     Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
       Blitter/0    0.00% |                                   |      0%      0%
         Video/0    0.00% |                                   |      0%      0%
  VideoEnhance/0    0.00% |                                   |      0%      0%

  PID            NAME  Render/3D      Blitter        Video      VideoEnhance
 2733       neverball |██████▌     ||            ||            ||            |
 2047            Xorg |███▊        ||            ||            ||            |
 2737        glxgears |█▍          ||            ||            ||            |
 2128           xfwm4 |            ||            ||            ||            |
 2047            Xorg |            ||            ||            ||            |
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Internally we track time spent on engines for each struct intel_context, both
for current and past contexts belonging to each open DRM file.

This can serve as a building block for several features from the wanted list:
smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
wanted by some customers, setrlimit(2) like controls, cgroups controller,
dynamic SSEU tuning, ...

To enable userspace access to the tracked data, we expose time spent on GPU per
client and per engine class in sysfs with a hierarchy like the below:

	# cd /sys/class/drm/card0/clients/
	# tree
	.
	├── 7
	│   ├── busy
	│   │   ├── 0
	│   │   ├── 1
	│   │   ├── 2
	│   │   └── 3
	│   ├── name
	│   └── pid
	├── 8
	│   ├── busy
	│   │   ├── 0
	│   │   ├── 1
	│   │   ├── 2
	│   │   └── 3
	│   ├── name
	│   └── pid
	└── 9
	    ├── busy
	    │   ├── 0
	    │   ├── 1
	    │   ├── 2
	    │   └── 3
	    ├── name
	    └── pid

Files in 'busy' directories are numbered using the engine class ABI values and
they contain accumulated nanoseconds each client spent on engines of a
respective class.

Tvrtko Ursulin (7):
  drm/i915: Expose list of clients in sysfs
  drm/i915: Update client name on context create
  drm/i915: Make GEM contexts track DRM clients
  drm/i915: Track runtime spent in closed and unreachable GEM contexts
  drm/i915: Track all user contexts per client
  drm/i915: Track context current active time
  drm/i915: Expose per-engine client busyness

 drivers/gpu/drm/i915/Makefile                 |   5 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
 drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
 drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
 .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
 .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
 drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
 drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
 drivers/gpu/drm/i915/i915_drv.c               |   6 +
 drivers/gpu/drm/i915/i915_drv.h               |   5 +
 drivers/gpu/drm/i915/i915_gem.c               |  21 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
 drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
 19 files changed, 716 insertions(+), 81 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 1/7] drm/i915: Expose list of clients in sysfs
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Expose a list of clients with open file handles in sysfs.

This will be a basis for a top-like utility showing per-client and per-
engine GPU load.

Currently we only expose each client's pid and name under opaque numbered
directories in /sys/class/drm/card0/clients/.

For instance:

/sys/class/drm/card0/clients/3/name: Xorg
/sys/class/drm/card0/clients/3/pid: 5664

v2:
 Chris Wilson:
 * Enclose new members into dedicated structs.
 * Protect against failed sysfs registration.

v3:
 * sysfs_attr_init.

v4:
 * Fix for internal clients.

v5:
 * Use cyclic ida for client id. (Chris)
 * Do not leak pid reference. (Chris)
 * Tidy code with some locals.

v6:
 * Use xa_alloc_cyclic to simplify locking. (Chris)
 * No need to unregister individial sysfs files. (Chris)
 * Rebase on top of fpriv kref.
 * Track client closed status and reflect in sysfs.

v7:
 * Make drm_client more standalone concept.

v8:
 * Simplify sysfs show. (Chris)
 * Always track name and pid.

v9:
 * Fix cyclic id assignment.

v10:
 * No need for a mutex around xa_alloc_cyclic.
 * Refactor sysfs into own function.
 * Unregister sysfs before freeing pid and name.
 * Move clients setup into own function.

v11:
 * Call clients init directly from driver init. (Chris)

v12:
 * Do not fail client add on id wrap. (Maciej)

v13 (Lucas): Rebase on upstream

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> # v11
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-2-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-2-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/Makefile          |   5 +-
 drivers/gpu/drm/i915/i915_drm_client.c | 200 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drm_client.h |  71 +++++++++
 drivers/gpu/drm/i915/i915_drv.c        |   6 +
 drivers/gpu/drm/i915/i915_drv.h        |   5 +
 drivers/gpu/drm/i915/i915_gem.c        |  21 ++-
 drivers/gpu/drm/i915/i915_sysfs.c      |   8 +
 7 files changed, 311 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d0d936d9137b..e89ce541fe68 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -33,8 +33,9 @@ subdir-ccflags-y += -I$(srctree)/$(src)
 # Please keep these build lists sorted!
 
 # core driver code
-i915-y += i915_drv.o \
-	  i915_config.o \
+i915-y += i915_config.o \
+	  i915_drm_client.o \
+	  i915_drv.o \
 	  i915_irq.o \
 	  i915_getparam.o \
 	  i915_mitigations.o \
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
new file mode 100644
index 000000000000..7c2d36860ac1
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "i915_drm_client.h"
+#include "i915_gem.h"
+#include "i915_utils.h"
+
+void i915_drm_clients_init(struct i915_drm_clients *clients,
+			   struct drm_i915_private *i915)
+{
+	clients->i915 = i915;
+
+	clients->next_id = 0;
+	xa_init_flags(&clients->xarray, XA_FLAGS_ALLOC);
+}
+
+static ssize_t
+show_client_name(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct i915_drm_client *client =
+		container_of(attr, typeof(*client), attr.name);
+
+	return sysfs_emit(buf,
+			  READ_ONCE(client->closed) ? "<%s>\n" : "%s\n",
+			  client->name);
+}
+
+static ssize_t
+show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct i915_drm_client *client =
+		container_of(attr, typeof(*client), attr.pid);
+
+	return sysfs_emit(buf,
+			  READ_ONCE(client->closed) ? "<%u>\n" : "%u\n",
+			  pid_nr(client->pid));
+}
+
+static int __client_register_sysfs(struct i915_drm_client *client)
+{
+	const struct {
+		const char *name;
+		struct device_attribute *attr;
+		ssize_t (*show)(struct device *dev,
+				struct device_attribute *attr,
+				char *buf);
+	} files[] = {
+		{ "name", &client->attr.name, show_client_name },
+		{ "pid", &client->attr.pid, show_client_pid },
+	};
+	unsigned int i;
+	char buf[16];
+	int ret;
+
+	ret = scnprintf(buf, sizeof(buf), "%u", client->id);
+	if (ret == sizeof(buf))
+		return -EINVAL;
+
+	client->root = kobject_create_and_add(buf, client->clients->root);
+	if (!client->root)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(files); i++) {
+		struct device_attribute *attr = files[i].attr;
+
+		sysfs_attr_init(&attr->attr);
+
+		attr->attr.name = files[i].name;
+		attr->attr.mode = 0444;
+		attr->show = files[i].show;
+
+		ret = sysfs_create_file(client->root, &attr->attr);
+		if (ret)
+			break;
+	}
+
+	if (ret)
+		kobject_put(client->root);
+
+	return ret;
+}
+
+static void __client_unregister_sysfs(struct i915_drm_client *client)
+{
+	kobject_put(fetch_and_zero(&client->root));
+}
+
+static int
+__i915_drm_client_register(struct i915_drm_client *client,
+			   struct task_struct *task)
+{
+	struct i915_drm_clients *clients = client->clients;
+	char *name;
+	int ret;
+
+	name = kstrdup(task->comm, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+
+	client->pid = get_task_pid(task, PIDTYPE_PID);
+	client->name = name;
+
+	if (!clients->root)
+		return 0; /* intel_fbdev_init registers a client before sysfs */
+
+	ret = __client_register_sysfs(client);
+	if (ret)
+		goto err_sysfs;
+
+	return 0;
+
+err_sysfs:
+	put_pid(client->pid);
+	kfree(client->name);
+
+	return ret;
+}
+
+static void __i915_drm_client_unregister(struct i915_drm_client *client)
+{
+	__client_unregister_sysfs(client);
+
+	put_pid(fetch_and_zero(&client->pid));
+	kfree(fetch_and_zero(&client->name));
+}
+
+static void __rcu_i915_drm_client_free(struct work_struct *wrk)
+{
+	struct i915_drm_client *client =
+		container_of(wrk, typeof(*client), rcu.work);
+
+	__i915_drm_client_unregister(client);
+
+	xa_erase(&client->clients->xarray, client->id);
+	kfree(client);
+}
+
+struct i915_drm_client *
+i915_drm_client_add(struct i915_drm_clients *clients, struct task_struct *task)
+{
+	struct i915_drm_client *client;
+	int ret;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&client->kref);
+	client->clients = clients;
+	INIT_RCU_WORK(&client->rcu, __rcu_i915_drm_client_free);
+
+	ret = xa_alloc_cyclic(&clients->xarray, &client->id, client,
+			      xa_limit_32b, &clients->next_id, GFP_KERNEL);
+	if (ret < 0)
+		goto err_id;
+
+	ret = __i915_drm_client_register(client, task);
+	if (ret)
+		goto err_register;
+
+	return client;
+
+err_register:
+	xa_erase(&clients->xarray, client->id);
+err_id:
+	kfree(client);
+
+	return ERR_PTR(ret);
+}
+
+void __i915_drm_client_free(struct kref *kref)
+{
+	struct i915_drm_client *client =
+		container_of(kref, typeof(*client), kref);
+
+	queue_rcu_work(system_wq, &client->rcu);
+}
+
+void i915_drm_client_close(struct i915_drm_client *client)
+{
+	GEM_BUG_ON(READ_ONCE(client->closed));
+	WRITE_ONCE(client->closed, true);
+	i915_drm_client_put(client);
+}
+
+void i915_drm_clients_fini(struct i915_drm_clients *clients)
+{
+	while (!xa_empty(&clients->xarray)) {
+		rcu_barrier();
+		flush_workqueue(system_wq);
+	}
+
+	xa_destroy(&clients->xarray);
+}
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
new file mode 100644
index 000000000000..150f8e8d34e6
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __I915_DRM_CLIENT_H__
+#define __I915_DRM_CLIENT_H__
+
+#include <linux/device.h>
+#include <linux/kobject.h>
+#include <linux/kref.h>
+#include <linux/pid.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/xarray.h>
+
+struct drm_i915_private;
+
+struct i915_drm_clients {
+	struct drm_i915_private *i915;
+
+	struct xarray xarray;
+	u32 next_id;
+
+	struct kobject *root;
+};
+
+struct i915_drm_client {
+	struct kref kref;
+
+	struct rcu_work rcu;
+
+	unsigned int id;
+	struct pid *pid;
+	char *name;
+	bool closed;
+
+	struct i915_drm_clients *clients;
+
+	struct kobject *root;
+	struct {
+		struct device_attribute pid;
+		struct device_attribute name;
+	} attr;
+};
+
+void i915_drm_clients_init(struct i915_drm_clients *clients,
+			   struct drm_i915_private *i915);
+
+static inline struct i915_drm_client *
+i915_drm_client_get(struct i915_drm_client *client)
+{
+	kref_get(&client->kref);
+	return client;
+}
+
+void __i915_drm_client_free(struct kref *kref);
+
+static inline void i915_drm_client_put(struct i915_drm_client *client)
+{
+	kref_put(&client->kref, __i915_drm_client_free);
+}
+
+void i915_drm_client_close(struct i915_drm_client *client);
+
+struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients,
+					    struct task_struct *task);
+
+void i915_drm_clients_fini(struct i915_drm_clients *clients);
+
+#endif /* !__I915_DRM_CLIENT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5118dc8386b2..2be26aea035b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -69,6 +69,7 @@
 #include "gt/intel_rc6.h"
 
 #include "i915_debugfs.h"
+#include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_ioc32.h"
 #include "i915_irq.h"
@@ -339,6 +340,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 
 	intel_gt_init_early(&dev_priv->gt, dev_priv);
 
+	i915_drm_clients_init(&dev_priv->clients, dev_priv);
+
 	i915_gem_init_early(dev_priv);
 
 	/* This must be called before any calls to HAS_PCH_* */
@@ -358,6 +361,7 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 
 err_gem:
 	i915_gem_cleanup_early(dev_priv);
+	i915_drm_clients_fini(&dev_priv->clients);
 	intel_gt_driver_late_release(&dev_priv->gt);
 	vlv_suspend_cleanup(dev_priv);
 err_workqueues:
@@ -375,6 +379,7 @@ static void i915_driver_late_release(struct drm_i915_private *dev_priv)
 	intel_irq_fini(dev_priv);
 	intel_power_domains_cleanup(dev_priv);
 	i915_gem_cleanup_early(dev_priv);
+	i915_drm_clients_fini(&dev_priv->clients);
 	intel_gt_driver_late_release(&dev_priv->gt);
 	vlv_suspend_cleanup(dev_priv);
 	i915_workqueues_cleanup(dev_priv);
@@ -984,6 +989,7 @@ static void i915_driver_postclose(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 
 	i915_gem_context_close(file);
+	i915_drm_client_close(file_priv->client);
 
 	kfree_rcu(file_priv, rcu);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 61308ce19059..6c0f12248156 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -95,6 +95,7 @@
 #include "intel_wakeref.h"
 #include "intel_wopcm.h"
 
+#include "i915_drm_client.h"
 #include "i915_gem.h"
 #include "i915_gem_gtt.h"
 #include "i915_gpu_error.h"
@@ -221,6 +222,8 @@ struct drm_i915_file_private {
 	/** ban_score: Accumulated score of all ctx bans and fast hangs. */
 	atomic_t ban_score;
 	unsigned long hang_timestamp;
+
+	struct i915_drm_client *client;
 };
 
 /* Interface history:
@@ -1160,6 +1163,8 @@ struct drm_i915_private {
 
 	struct i915_pmu pmu;
 
+	struct i915_drm_clients clients;
+
 	struct i915_hdcp_comp_master *hdcp_master;
 	bool hdcp_comp_added;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d0018c5f88bd..8643b2a67f6d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1182,25 +1182,40 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv;
-	int ret;
+	struct i915_drm_client *client;
+	int ret = -ENOMEM;
 
 	DRM_DEBUG("\n");
 
 	file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
 	if (!file_priv)
-		return -ENOMEM;
+		goto err_alloc;
+
+	client = i915_drm_client_add(&i915->clients, current);
+	if (IS_ERR(client)) {
+		ret = PTR_ERR(client);
+		goto err_client;
+	}
 
 	file->driver_priv = file_priv;
 	file_priv->dev_priv = i915;
 	file_priv->file = file;
+	file_priv->client = client;
 
 	file_priv->bsd_engine = -1;
 	file_priv->hang_timestamp = jiffies;
 
 	ret = i915_gem_context_open(i915, file);
 	if (ret)
-		kfree(file_priv);
+		goto err_context;
+
+	return 0;
 
+err_context:
+	i915_drm_client_close(client);
+err_client:
+	kfree(file_priv);
+err_alloc:
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 4c6b5d52b5ca..e7aef29068ac 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -554,6 +554,11 @@ void i915_setup_sysfs(struct drm_i915_private *dev_priv)
 	struct device *kdev = dev_priv->drm.primary->kdev;
 	int ret;
 
+	dev_priv->clients.root =
+		kobject_create_and_add("clients", &kdev->kobj);
+	if (!dev_priv->clients.root)
+		drm_warn(&dev_priv->drm, "Per-client sysfs setup failed\n");
+
 #ifdef CONFIG_PM
 	if (HAS_RC6(dev_priv)) {
 		ret = sysfs_merge_group(&kdev->kobj,
@@ -621,4 +626,7 @@ void i915_teardown_sysfs(struct drm_i915_private *dev_priv)
 	sysfs_unmerge_group(&kdev->kobj, &rc6_attr_group);
 	sysfs_unmerge_group(&kdev->kobj, &rc6p_attr_group);
 #endif
+
+	if (dev_priv->clients.root)
+		kobject_put(dev_priv->clients.root);
 }
-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 1/7] drm/i915: Expose list of clients in sysfs
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Expose a list of clients with open file handles in sysfs.

This will be a basis for a top-like utility showing per-client and per-
engine GPU load.

Currently we only expose each client's pid and name under opaque numbered
directories in /sys/class/drm/card0/clients/.

For instance:

/sys/class/drm/card0/clients/3/name: Xorg
/sys/class/drm/card0/clients/3/pid: 5664

v2:
 Chris Wilson:
 * Enclose new members into dedicated structs.
 * Protect against failed sysfs registration.

v3:
 * sysfs_attr_init.

v4:
 * Fix for internal clients.

v5:
 * Use cyclic ida for client id. (Chris)
 * Do not leak pid reference. (Chris)
 * Tidy code with some locals.

v6:
 * Use xa_alloc_cyclic to simplify locking. (Chris)
 * No need to unregister individial sysfs files. (Chris)
 * Rebase on top of fpriv kref.
 * Track client closed status and reflect in sysfs.

v7:
 * Make drm_client more standalone concept.

v8:
 * Simplify sysfs show. (Chris)
 * Always track name and pid.

v9:
 * Fix cyclic id assignment.

v10:
 * No need for a mutex around xa_alloc_cyclic.
 * Refactor sysfs into own function.
 * Unregister sysfs before freeing pid and name.
 * Move clients setup into own function.

v11:
 * Call clients init directly from driver init. (Chris)

v12:
 * Do not fail client add on id wrap. (Maciej)

v13 (Lucas): Rebase on upstream

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> # v11
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-2-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-2-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/Makefile          |   5 +-
 drivers/gpu/drm/i915/i915_drm_client.c | 200 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_drm_client.h |  71 +++++++++
 drivers/gpu/drm/i915/i915_drv.c        |   6 +
 drivers/gpu/drm/i915/i915_drv.h        |   5 +
 drivers/gpu/drm/i915/i915_gem.c        |  21 ++-
 drivers/gpu/drm/i915/i915_sysfs.c      |   8 +
 7 files changed, 311 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
 create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index d0d936d9137b..e89ce541fe68 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -33,8 +33,9 @@ subdir-ccflags-y += -I$(srctree)/$(src)
 # Please keep these build lists sorted!
 
 # core driver code
-i915-y += i915_drv.o \
-	  i915_config.o \
+i915-y += i915_config.o \
+	  i915_drm_client.o \
+	  i915_drv.o \
 	  i915_irq.o \
 	  i915_getparam.o \
 	  i915_mitigations.o \
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
new file mode 100644
index 000000000000..7c2d36860ac1
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -0,0 +1,200 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include <linux/kernel.h>
+#include <linux/slab.h>
+#include <linux/types.h>
+
+#include "i915_drm_client.h"
+#include "i915_gem.h"
+#include "i915_utils.h"
+
+void i915_drm_clients_init(struct i915_drm_clients *clients,
+			   struct drm_i915_private *i915)
+{
+	clients->i915 = i915;
+
+	clients->next_id = 0;
+	xa_init_flags(&clients->xarray, XA_FLAGS_ALLOC);
+}
+
+static ssize_t
+show_client_name(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct i915_drm_client *client =
+		container_of(attr, typeof(*client), attr.name);
+
+	return sysfs_emit(buf,
+			  READ_ONCE(client->closed) ? "<%s>\n" : "%s\n",
+			  client->name);
+}
+
+static ssize_t
+show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct i915_drm_client *client =
+		container_of(attr, typeof(*client), attr.pid);
+
+	return sysfs_emit(buf,
+			  READ_ONCE(client->closed) ? "<%u>\n" : "%u\n",
+			  pid_nr(client->pid));
+}
+
+static int __client_register_sysfs(struct i915_drm_client *client)
+{
+	const struct {
+		const char *name;
+		struct device_attribute *attr;
+		ssize_t (*show)(struct device *dev,
+				struct device_attribute *attr,
+				char *buf);
+	} files[] = {
+		{ "name", &client->attr.name, show_client_name },
+		{ "pid", &client->attr.pid, show_client_pid },
+	};
+	unsigned int i;
+	char buf[16];
+	int ret;
+
+	ret = scnprintf(buf, sizeof(buf), "%u", client->id);
+	if (ret == sizeof(buf))
+		return -EINVAL;
+
+	client->root = kobject_create_and_add(buf, client->clients->root);
+	if (!client->root)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(files); i++) {
+		struct device_attribute *attr = files[i].attr;
+
+		sysfs_attr_init(&attr->attr);
+
+		attr->attr.name = files[i].name;
+		attr->attr.mode = 0444;
+		attr->show = files[i].show;
+
+		ret = sysfs_create_file(client->root, &attr->attr);
+		if (ret)
+			break;
+	}
+
+	if (ret)
+		kobject_put(client->root);
+
+	return ret;
+}
+
+static void __client_unregister_sysfs(struct i915_drm_client *client)
+{
+	kobject_put(fetch_and_zero(&client->root));
+}
+
+static int
+__i915_drm_client_register(struct i915_drm_client *client,
+			   struct task_struct *task)
+{
+	struct i915_drm_clients *clients = client->clients;
+	char *name;
+	int ret;
+
+	name = kstrdup(task->comm, GFP_KERNEL);
+	if (!name)
+		return -ENOMEM;
+
+	client->pid = get_task_pid(task, PIDTYPE_PID);
+	client->name = name;
+
+	if (!clients->root)
+		return 0; /* intel_fbdev_init registers a client before sysfs */
+
+	ret = __client_register_sysfs(client);
+	if (ret)
+		goto err_sysfs;
+
+	return 0;
+
+err_sysfs:
+	put_pid(client->pid);
+	kfree(client->name);
+
+	return ret;
+}
+
+static void __i915_drm_client_unregister(struct i915_drm_client *client)
+{
+	__client_unregister_sysfs(client);
+
+	put_pid(fetch_and_zero(&client->pid));
+	kfree(fetch_and_zero(&client->name));
+}
+
+static void __rcu_i915_drm_client_free(struct work_struct *wrk)
+{
+	struct i915_drm_client *client =
+		container_of(wrk, typeof(*client), rcu.work);
+
+	__i915_drm_client_unregister(client);
+
+	xa_erase(&client->clients->xarray, client->id);
+	kfree(client);
+}
+
+struct i915_drm_client *
+i915_drm_client_add(struct i915_drm_clients *clients, struct task_struct *task)
+{
+	struct i915_drm_client *client;
+	int ret;
+
+	client = kzalloc(sizeof(*client), GFP_KERNEL);
+	if (!client)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&client->kref);
+	client->clients = clients;
+	INIT_RCU_WORK(&client->rcu, __rcu_i915_drm_client_free);
+
+	ret = xa_alloc_cyclic(&clients->xarray, &client->id, client,
+			      xa_limit_32b, &clients->next_id, GFP_KERNEL);
+	if (ret < 0)
+		goto err_id;
+
+	ret = __i915_drm_client_register(client, task);
+	if (ret)
+		goto err_register;
+
+	return client;
+
+err_register:
+	xa_erase(&clients->xarray, client->id);
+err_id:
+	kfree(client);
+
+	return ERR_PTR(ret);
+}
+
+void __i915_drm_client_free(struct kref *kref)
+{
+	struct i915_drm_client *client =
+		container_of(kref, typeof(*client), kref);
+
+	queue_rcu_work(system_wq, &client->rcu);
+}
+
+void i915_drm_client_close(struct i915_drm_client *client)
+{
+	GEM_BUG_ON(READ_ONCE(client->closed));
+	WRITE_ONCE(client->closed, true);
+	i915_drm_client_put(client);
+}
+
+void i915_drm_clients_fini(struct i915_drm_clients *clients)
+{
+	while (!xa_empty(&clients->xarray)) {
+		rcu_barrier();
+		flush_workqueue(system_wq);
+	}
+
+	xa_destroy(&clients->xarray);
+}
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
new file mode 100644
index 000000000000..150f8e8d34e6
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: MIT */
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#ifndef __I915_DRM_CLIENT_H__
+#define __I915_DRM_CLIENT_H__
+
+#include <linux/device.h>
+#include <linux/kobject.h>
+#include <linux/kref.h>
+#include <linux/pid.h>
+#include <linux/rcupdate.h>
+#include <linux/sched.h>
+#include <linux/xarray.h>
+
+struct drm_i915_private;
+
+struct i915_drm_clients {
+	struct drm_i915_private *i915;
+
+	struct xarray xarray;
+	u32 next_id;
+
+	struct kobject *root;
+};
+
+struct i915_drm_client {
+	struct kref kref;
+
+	struct rcu_work rcu;
+
+	unsigned int id;
+	struct pid *pid;
+	char *name;
+	bool closed;
+
+	struct i915_drm_clients *clients;
+
+	struct kobject *root;
+	struct {
+		struct device_attribute pid;
+		struct device_attribute name;
+	} attr;
+};
+
+void i915_drm_clients_init(struct i915_drm_clients *clients,
+			   struct drm_i915_private *i915);
+
+static inline struct i915_drm_client *
+i915_drm_client_get(struct i915_drm_client *client)
+{
+	kref_get(&client->kref);
+	return client;
+}
+
+void __i915_drm_client_free(struct kref *kref);
+
+static inline void i915_drm_client_put(struct i915_drm_client *client)
+{
+	kref_put(&client->kref, __i915_drm_client_free);
+}
+
+void i915_drm_client_close(struct i915_drm_client *client);
+
+struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients,
+					    struct task_struct *task);
+
+void i915_drm_clients_fini(struct i915_drm_clients *clients);
+
+#endif /* !__I915_DRM_CLIENT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5118dc8386b2..2be26aea035b 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -69,6 +69,7 @@
 #include "gt/intel_rc6.h"
 
 #include "i915_debugfs.h"
+#include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_ioc32.h"
 #include "i915_irq.h"
@@ -339,6 +340,8 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 
 	intel_gt_init_early(&dev_priv->gt, dev_priv);
 
+	i915_drm_clients_init(&dev_priv->clients, dev_priv);
+
 	i915_gem_init_early(dev_priv);
 
 	/* This must be called before any calls to HAS_PCH_* */
@@ -358,6 +361,7 @@ static int i915_driver_early_probe(struct drm_i915_private *dev_priv)
 
 err_gem:
 	i915_gem_cleanup_early(dev_priv);
+	i915_drm_clients_fini(&dev_priv->clients);
 	intel_gt_driver_late_release(&dev_priv->gt);
 	vlv_suspend_cleanup(dev_priv);
 err_workqueues:
@@ -375,6 +379,7 @@ static void i915_driver_late_release(struct drm_i915_private *dev_priv)
 	intel_irq_fini(dev_priv);
 	intel_power_domains_cleanup(dev_priv);
 	i915_gem_cleanup_early(dev_priv);
+	i915_drm_clients_fini(&dev_priv->clients);
 	intel_gt_driver_late_release(&dev_priv->gt);
 	vlv_suspend_cleanup(dev_priv);
 	i915_workqueues_cleanup(dev_priv);
@@ -984,6 +989,7 @@ static void i915_driver_postclose(struct drm_device *dev, struct drm_file *file)
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 
 	i915_gem_context_close(file);
+	i915_drm_client_close(file_priv->client);
 
 	kfree_rcu(file_priv, rcu);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 61308ce19059..6c0f12248156 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -95,6 +95,7 @@
 #include "intel_wakeref.h"
 #include "intel_wopcm.h"
 
+#include "i915_drm_client.h"
 #include "i915_gem.h"
 #include "i915_gem_gtt.h"
 #include "i915_gpu_error.h"
@@ -221,6 +222,8 @@ struct drm_i915_file_private {
 	/** ban_score: Accumulated score of all ctx bans and fast hangs. */
 	atomic_t ban_score;
 	unsigned long hang_timestamp;
+
+	struct i915_drm_client *client;
 };
 
 /* Interface history:
@@ -1160,6 +1163,8 @@ struct drm_i915_private {
 
 	struct i915_pmu pmu;
 
+	struct i915_drm_clients clients;
+
 	struct i915_hdcp_comp_master *hdcp_master;
 	bool hdcp_comp_added;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d0018c5f88bd..8643b2a67f6d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1182,25 +1182,40 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 int i915_gem_open(struct drm_i915_private *i915, struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv;
-	int ret;
+	struct i915_drm_client *client;
+	int ret = -ENOMEM;
 
 	DRM_DEBUG("\n");
 
 	file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
 	if (!file_priv)
-		return -ENOMEM;
+		goto err_alloc;
+
+	client = i915_drm_client_add(&i915->clients, current);
+	if (IS_ERR(client)) {
+		ret = PTR_ERR(client);
+		goto err_client;
+	}
 
 	file->driver_priv = file_priv;
 	file_priv->dev_priv = i915;
 	file_priv->file = file;
+	file_priv->client = client;
 
 	file_priv->bsd_engine = -1;
 	file_priv->hang_timestamp = jiffies;
 
 	ret = i915_gem_context_open(i915, file);
 	if (ret)
-		kfree(file_priv);
+		goto err_context;
+
+	return 0;
 
+err_context:
+	i915_drm_client_close(client);
+err_client:
+	kfree(file_priv);
+err_alloc:
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 4c6b5d52b5ca..e7aef29068ac 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -554,6 +554,11 @@ void i915_setup_sysfs(struct drm_i915_private *dev_priv)
 	struct device *kdev = dev_priv->drm.primary->kdev;
 	int ret;
 
+	dev_priv->clients.root =
+		kobject_create_and_add("clients", &kdev->kobj);
+	if (!dev_priv->clients.root)
+		drm_warn(&dev_priv->drm, "Per-client sysfs setup failed\n");
+
 #ifdef CONFIG_PM
 	if (HAS_RC6(dev_priv)) {
 		ret = sysfs_merge_group(&kdev->kobj,
@@ -621,4 +626,7 @@ void i915_teardown_sysfs(struct drm_i915_private *dev_priv)
 	sysfs_unmerge_group(&kdev->kobj, &rc6_attr_group);
 	sysfs_unmerge_group(&kdev->kobj, &rc6p_attr_group);
 #endif
+
+	if (dev_priv->clients.root)
+		kobject_put(dev_priv->clients.root);
 }
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 2/7] drm/i915: Update client name on context create
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Some clients have the DRM fd passed to them over a socket by the X server.

Grab the real client and pid when they create their first context and
update the exposed data for more useful enumeration.

To enable lockless access to client name and pid data from the following
patches, we also make these fields rcu protected. In this way asynchronous
code paths where both contexts which remain after the client exit, and
access to client name and pid as they are getting updated due context
creation running in parallel with name/pid queries.

v2:
 * Do not leak the pid reference and borrow context idr_lock. (Chris)

v3:
 * More avoiding leaks. (Chris)

v4:
 * Move update completely to drm client. (Chris)
 * Do not lose previous client data on failure to re-register and simplify
   update to only touch what it needs.

v5:
 * Reuse ext_data local. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-3-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-3-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  5 ++
 drivers/gpu/drm/i915/i915_drm_client.c      | 93 +++++++++++++++++----
 drivers/gpu/drm/i915/i915_drm_client.h      | 34 +++++++-
 3 files changed, 115 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 188dee13e017..e5f8d94666e8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -76,6 +76,7 @@
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
 
+#include "i915_drm_client.h"
 #include "i915_gem_context.h"
 #include "i915_globals.h"
 #include "i915_trace.h"
@@ -2321,6 +2322,10 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 		return -EIO;
 	}
 
+	ret = i915_drm_client_update(ext_data.fpriv->client, current);
+	if (ret)
+		return ret;
+
 	ext_data.ctx = i915_gem_create_context(i915, args->flags);
 	if (IS_ERR(ext_data.ctx))
 		return PTR_ERR(ext_data.ctx);
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index 7c2d36860ac1..ad3d36c9dee2 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -7,7 +7,10 @@
 #include <linux/slab.h>
 #include <linux/types.h>
 
+#include <drm/drm_print.h>
+
 #include "i915_drm_client.h"
+#include "i915_drv.h"
 #include "i915_gem.h"
 #include "i915_utils.h"
 
@@ -25,10 +28,15 @@ show_client_name(struct device *kdev, struct device_attribute *attr, char *buf)
 {
 	struct i915_drm_client *client =
 		container_of(attr, typeof(*client), attr.name);
+	int ret;
 
-	return sysfs_emit(buf,
-			  READ_ONCE(client->closed) ? "<%s>\n" : "%s\n",
-			  client->name);
+	rcu_read_lock();
+	ret = sysfs_emit(buf,
+			 READ_ONCE(client->closed) ? "<%s>\n" : "%s\n",
+			 i915_drm_client_name(client));
+	rcu_read_unlock();
+
+	return ret;
 }
 
 static ssize_t
@@ -36,10 +44,15 @@ show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
 {
 	struct i915_drm_client *client =
 		container_of(attr, typeof(*client), attr.pid);
+	int ret;
+
+	rcu_read_lock();
+	ret = sysfs_emit(buf,
+			 READ_ONCE(client->closed) ? "<%u>\n" : "%u\n",
+			 pid_nr(i915_drm_client_pid(client)));
+	rcu_read_unlock();
 
-	return sysfs_emit(buf,
-			  READ_ONCE(client->closed) ? "<%u>\n" : "%u\n",
-			  pid_nr(client->pid));
+	return ret;
 }
 
 static int __client_register_sysfs(struct i915_drm_client *client)
@@ -91,20 +104,46 @@ static void __client_unregister_sysfs(struct i915_drm_client *client)
 	kobject_put(fetch_and_zero(&client->root));
 }
 
+static struct i915_drm_client_name *get_name(struct i915_drm_client *client,
+					     struct task_struct *task)
+{
+	struct i915_drm_client_name *name;
+	int len = strlen(task->comm);
+
+	name = kmalloc(struct_size(name, name, len + 1), GFP_KERNEL);
+	if (!name)
+		return NULL;
+
+	init_rcu_head(&name->rcu);
+	name->client = client;
+	name->pid = get_task_pid(task, PIDTYPE_PID);
+	memcpy(name->name, task->comm, len + 1);
+
+	return name;
+}
+
+static void free_name(struct rcu_head *rcu)
+{
+	struct i915_drm_client_name *name =
+		container_of(rcu, typeof(*name), rcu);
+
+	put_pid(name->pid);
+	kfree(name);
+}
+
 static int
 __i915_drm_client_register(struct i915_drm_client *client,
 			   struct task_struct *task)
 {
 	struct i915_drm_clients *clients = client->clients;
-	char *name;
+	struct i915_drm_client_name *name;
 	int ret;
 
-	name = kstrdup(task->comm, GFP_KERNEL);
+	name = get_name(client, task);
 	if (!name)
 		return -ENOMEM;
 
-	client->pid = get_task_pid(task, PIDTYPE_PID);
-	client->name = name;
+	RCU_INIT_POINTER(client->name, name);
 
 	if (!clients->root)
 		return 0; /* intel_fbdev_init registers a client before sysfs */
@@ -116,18 +155,22 @@ __i915_drm_client_register(struct i915_drm_client *client,
 	return 0;
 
 err_sysfs:
-	put_pid(client->pid);
-	kfree(client->name);
-
+	RCU_INIT_POINTER(client->name, NULL);
+	call_rcu(&name->rcu, free_name);
 	return ret;
 }
 
 static void __i915_drm_client_unregister(struct i915_drm_client *client)
 {
+	struct i915_drm_client_name *name;
+
 	__client_unregister_sysfs(client);
 
-	put_pid(fetch_and_zero(&client->pid));
-	kfree(fetch_and_zero(&client->name));
+	mutex_lock(&client->update_lock);
+	name = rcu_replace_pointer(client->name, NULL, true);
+	mutex_unlock(&client->update_lock);
+
+	call_rcu(&name->rcu, free_name);
 }
 
 static void __rcu_i915_drm_client_free(struct work_struct *wrk)
@@ -152,6 +195,7 @@ i915_drm_client_add(struct i915_drm_clients *clients, struct task_struct *task)
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&client->kref);
+	mutex_init(&client->update_lock);
 	client->clients = clients;
 	INIT_RCU_WORK(&client->rcu, __rcu_i915_drm_client_free);
 
@@ -189,6 +233,25 @@ void i915_drm_client_close(struct i915_drm_client *client)
 	i915_drm_client_put(client);
 }
 
+int
+i915_drm_client_update(struct i915_drm_client *client,
+		       struct task_struct *task)
+{
+	struct i915_drm_client_name *name;
+
+	name = get_name(client, task);
+	if (!name)
+		return -ENOMEM;
+
+	mutex_lock(&client->update_lock);
+	if (name->pid != rcu_dereference_protected(client->name, true)->pid)
+		name = rcu_replace_pointer(client->name, name, true);
+	mutex_unlock(&client->update_lock);
+
+	call_rcu(&name->rcu, free_name);
+	return 0;
+}
+
 void i915_drm_clients_fini(struct i915_drm_clients *clients)
 {
 	while (!xa_empty(&clients->xarray)) {
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 150f8e8d34e6..556a59d6b834 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -9,6 +9,7 @@
 #include <linux/device.h>
 #include <linux/kobject.h>
 #include <linux/kref.h>
+#include <linux/mutex.h>
 #include <linux/pid.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
@@ -25,14 +26,22 @@ struct i915_drm_clients {
 	struct kobject *root;
 };
 
+struct i915_drm_client_name {
+	struct rcu_head rcu;
+	struct i915_drm_client *client;
+	struct pid *pid;
+	char name[];
+};
+
 struct i915_drm_client {
 	struct kref kref;
 
 	struct rcu_work rcu;
 
+	struct mutex update_lock; /* Serializes name and pid updates. */
+
 	unsigned int id;
-	struct pid *pid;
-	char *name;
+	struct i915_drm_client_name __rcu *name;
 	bool closed;
 
 	struct i915_drm_clients *clients;
@@ -66,6 +75,27 @@ void i915_drm_client_close(struct i915_drm_client *client);
 struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients,
 					    struct task_struct *task);
 
+int i915_drm_client_update(struct i915_drm_client *client,
+			   struct task_struct *task);
+
+static inline const struct i915_drm_client_name *
+__i915_drm_client_name(const struct i915_drm_client *client)
+{
+	return rcu_dereference(client->name);
+}
+
+static inline const char *
+i915_drm_client_name(const struct i915_drm_client *client)
+{
+	return __i915_drm_client_name(client)->name;
+}
+
+static inline struct pid *
+i915_drm_client_pid(const struct i915_drm_client *client)
+{
+	return __i915_drm_client_name(client)->pid;
+}
+
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
 
 #endif /* !__I915_DRM_CLIENT_H__ */
-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 2/7] drm/i915: Update client name on context create
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Some clients have the DRM fd passed to them over a socket by the X server.

Grab the real client and pid when they create their first context and
update the exposed data for more useful enumeration.

To enable lockless access to client name and pid data from the following
patches, we also make these fields rcu protected. In this way asynchronous
code paths where both contexts which remain after the client exit, and
access to client name and pid as they are getting updated due context
creation running in parallel with name/pid queries.

v2:
 * Do not leak the pid reference and borrow context idr_lock. (Chris)

v3:
 * More avoiding leaks. (Chris)

v4:
 * Move update completely to drm client. (Chris)
 * Do not lose previous client data on failure to re-register and simplify
   update to only touch what it needs.

v5:
 * Reuse ext_data local. (Chris)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-3-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-3-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c |  5 ++
 drivers/gpu/drm/i915/i915_drm_client.c      | 93 +++++++++++++++++----
 drivers/gpu/drm/i915/i915_drm_client.h      | 34 +++++++-
 3 files changed, 115 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 188dee13e017..e5f8d94666e8 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -76,6 +76,7 @@
 #include "gt/intel_gpu_commands.h"
 #include "gt/intel_ring.h"
 
+#include "i915_drm_client.h"
 #include "i915_gem_context.h"
 #include "i915_globals.h"
 #include "i915_trace.h"
@@ -2321,6 +2322,10 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 		return -EIO;
 	}
 
+	ret = i915_drm_client_update(ext_data.fpriv->client, current);
+	if (ret)
+		return ret;
+
 	ext_data.ctx = i915_gem_create_context(i915, args->flags);
 	if (IS_ERR(ext_data.ctx))
 		return PTR_ERR(ext_data.ctx);
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index 7c2d36860ac1..ad3d36c9dee2 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -7,7 +7,10 @@
 #include <linux/slab.h>
 #include <linux/types.h>
 
+#include <drm/drm_print.h>
+
 #include "i915_drm_client.h"
+#include "i915_drv.h"
 #include "i915_gem.h"
 #include "i915_utils.h"
 
@@ -25,10 +28,15 @@ show_client_name(struct device *kdev, struct device_attribute *attr, char *buf)
 {
 	struct i915_drm_client *client =
 		container_of(attr, typeof(*client), attr.name);
+	int ret;
 
-	return sysfs_emit(buf,
-			  READ_ONCE(client->closed) ? "<%s>\n" : "%s\n",
-			  client->name);
+	rcu_read_lock();
+	ret = sysfs_emit(buf,
+			 READ_ONCE(client->closed) ? "<%s>\n" : "%s\n",
+			 i915_drm_client_name(client));
+	rcu_read_unlock();
+
+	return ret;
 }
 
 static ssize_t
@@ -36,10 +44,15 @@ show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
 {
 	struct i915_drm_client *client =
 		container_of(attr, typeof(*client), attr.pid);
+	int ret;
+
+	rcu_read_lock();
+	ret = sysfs_emit(buf,
+			 READ_ONCE(client->closed) ? "<%u>\n" : "%u\n",
+			 pid_nr(i915_drm_client_pid(client)));
+	rcu_read_unlock();
 
-	return sysfs_emit(buf,
-			  READ_ONCE(client->closed) ? "<%u>\n" : "%u\n",
-			  pid_nr(client->pid));
+	return ret;
 }
 
 static int __client_register_sysfs(struct i915_drm_client *client)
@@ -91,20 +104,46 @@ static void __client_unregister_sysfs(struct i915_drm_client *client)
 	kobject_put(fetch_and_zero(&client->root));
 }
 
+static struct i915_drm_client_name *get_name(struct i915_drm_client *client,
+					     struct task_struct *task)
+{
+	struct i915_drm_client_name *name;
+	int len = strlen(task->comm);
+
+	name = kmalloc(struct_size(name, name, len + 1), GFP_KERNEL);
+	if (!name)
+		return NULL;
+
+	init_rcu_head(&name->rcu);
+	name->client = client;
+	name->pid = get_task_pid(task, PIDTYPE_PID);
+	memcpy(name->name, task->comm, len + 1);
+
+	return name;
+}
+
+static void free_name(struct rcu_head *rcu)
+{
+	struct i915_drm_client_name *name =
+		container_of(rcu, typeof(*name), rcu);
+
+	put_pid(name->pid);
+	kfree(name);
+}
+
 static int
 __i915_drm_client_register(struct i915_drm_client *client,
 			   struct task_struct *task)
 {
 	struct i915_drm_clients *clients = client->clients;
-	char *name;
+	struct i915_drm_client_name *name;
 	int ret;
 
-	name = kstrdup(task->comm, GFP_KERNEL);
+	name = get_name(client, task);
 	if (!name)
 		return -ENOMEM;
 
-	client->pid = get_task_pid(task, PIDTYPE_PID);
-	client->name = name;
+	RCU_INIT_POINTER(client->name, name);
 
 	if (!clients->root)
 		return 0; /* intel_fbdev_init registers a client before sysfs */
@@ -116,18 +155,22 @@ __i915_drm_client_register(struct i915_drm_client *client,
 	return 0;
 
 err_sysfs:
-	put_pid(client->pid);
-	kfree(client->name);
-
+	RCU_INIT_POINTER(client->name, NULL);
+	call_rcu(&name->rcu, free_name);
 	return ret;
 }
 
 static void __i915_drm_client_unregister(struct i915_drm_client *client)
 {
+	struct i915_drm_client_name *name;
+
 	__client_unregister_sysfs(client);
 
-	put_pid(fetch_and_zero(&client->pid));
-	kfree(fetch_and_zero(&client->name));
+	mutex_lock(&client->update_lock);
+	name = rcu_replace_pointer(client->name, NULL, true);
+	mutex_unlock(&client->update_lock);
+
+	call_rcu(&name->rcu, free_name);
 }
 
 static void __rcu_i915_drm_client_free(struct work_struct *wrk)
@@ -152,6 +195,7 @@ i915_drm_client_add(struct i915_drm_clients *clients, struct task_struct *task)
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&client->kref);
+	mutex_init(&client->update_lock);
 	client->clients = clients;
 	INIT_RCU_WORK(&client->rcu, __rcu_i915_drm_client_free);
 
@@ -189,6 +233,25 @@ void i915_drm_client_close(struct i915_drm_client *client)
 	i915_drm_client_put(client);
 }
 
+int
+i915_drm_client_update(struct i915_drm_client *client,
+		       struct task_struct *task)
+{
+	struct i915_drm_client_name *name;
+
+	name = get_name(client, task);
+	if (!name)
+		return -ENOMEM;
+
+	mutex_lock(&client->update_lock);
+	if (name->pid != rcu_dereference_protected(client->name, true)->pid)
+		name = rcu_replace_pointer(client->name, name, true);
+	mutex_unlock(&client->update_lock);
+
+	call_rcu(&name->rcu, free_name);
+	return 0;
+}
+
 void i915_drm_clients_fini(struct i915_drm_clients *clients)
 {
 	while (!xa_empty(&clients->xarray)) {
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 150f8e8d34e6..556a59d6b834 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -9,6 +9,7 @@
 #include <linux/device.h>
 #include <linux/kobject.h>
 #include <linux/kref.h>
+#include <linux/mutex.h>
 #include <linux/pid.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
@@ -25,14 +26,22 @@ struct i915_drm_clients {
 	struct kobject *root;
 };
 
+struct i915_drm_client_name {
+	struct rcu_head rcu;
+	struct i915_drm_client *client;
+	struct pid *pid;
+	char name[];
+};
+
 struct i915_drm_client {
 	struct kref kref;
 
 	struct rcu_work rcu;
 
+	struct mutex update_lock; /* Serializes name and pid updates. */
+
 	unsigned int id;
-	struct pid *pid;
-	char *name;
+	struct i915_drm_client_name __rcu *name;
 	bool closed;
 
 	struct i915_drm_clients *clients;
@@ -66,6 +75,27 @@ void i915_drm_client_close(struct i915_drm_client *client);
 struct i915_drm_client *i915_drm_client_add(struct i915_drm_clients *clients,
 					    struct task_struct *task);
 
+int i915_drm_client_update(struct i915_drm_client *client,
+			   struct task_struct *task);
+
+static inline const struct i915_drm_client_name *
+__i915_drm_client_name(const struct i915_drm_client *client)
+{
+	return rcu_dereference(client->name);
+}
+
+static inline const char *
+i915_drm_client_name(const struct i915_drm_client *client)
+{
+	return __i915_drm_client_name(client)->name;
+}
+
+static inline struct pid *
+i915_drm_client_pid(const struct i915_drm_client *client)
+{
+	return __i915_drm_client_name(client)->pid;
+}
+
 void i915_drm_clients_fini(struct i915_drm_clients *clients);
 
 #endif /* !__I915_DRM_CLIENT_H__ */
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 3/7] drm/i915: Make GEM contexts track DRM clients
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

If we make GEM contexts keep a reference to i915_drm_client for the whole
of their lifetime, we can consolidate the current task pid and name usage
by getting it from the client.

v2: Don't bother supporting selftests contexts from debugfs. (Chris)
v3 (Lucas): Finish constructing ctx before adding it to the list
v4 (Ram): Rebase on upstream

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-4-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-4-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 20 ++++++++++++-----
 .../gpu/drm/i915/gem/i915_gem_context_types.h | 13 +++--------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 22 +++++++++++--------
 3 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index e5f8d94666e8..5ea42d5b0b1a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -345,13 +345,14 @@ void i915_gem_context_release(struct kref *ref)
 	trace_i915_context_free(ctx);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
-	mutex_destroy(&ctx->engines_mutex);
-	mutex_destroy(&ctx->lut_mutex);
+	if (ctx->client)
+		i915_drm_client_put(ctx->client);
 
 	if (ctx->timeline)
 		intel_timeline_put(ctx->timeline);
 
-	put_pid(ctx->pid);
+	mutex_destroy(&ctx->engines_mutex);
+	mutex_destroy(&ctx->lut_mutex);
 	mutex_destroy(&ctx->mutex);
 
 	kfree_rcu(ctx, rcu);
@@ -895,6 +896,7 @@ static int gem_context_register(struct i915_gem_context *ctx,
 				u32 *id)
 {
 	struct drm_i915_private *i915 = ctx->i915;
+	struct i915_drm_client *client;
 	struct i915_address_space *vm;
 	int ret;
 
@@ -906,15 +908,21 @@ static int gem_context_register(struct i915_gem_context *ctx,
 		WRITE_ONCE(vm->file, fpriv); /* XXX */
 	mutex_unlock(&ctx->mutex);
 
-	ctx->pid = get_task_pid(current, PIDTYPE_PID);
+	client = i915_drm_client_get(fpriv->client);
+
+	rcu_read_lock();
 	snprintf(ctx->name, sizeof(ctx->name), "%s[%d]",
-		 current->comm, pid_nr(ctx->pid));
+		 i915_drm_client_name(client),
+		 pid_nr(i915_drm_client_pid(client)));
+	rcu_read_unlock();
 
 	/* And finally expose ourselves to userspace via the idr */
 	ret = xa_alloc(&fpriv->context_xa, id, ctx, xa_limit_32b, GFP_KERNEL);
 	if (ret)
 		goto err_pid;
 
+	ctx->client = client;
+
 	spin_lock(&i915->gem.contexts.lock);
 	list_add_tail(&ctx->link, &i915->gem.contexts.list);
 	spin_unlock(&i915->gem.contexts.lock);
@@ -922,7 +930,7 @@ static int gem_context_register(struct i915_gem_context *ctx,
 	return 0;
 
 err_pid:
-	put_pid(fetch_and_zero(&ctx->pid));
+	i915_drm_client_put(client);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 340473aa70de..eb098f2896c5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -96,19 +96,12 @@ struct i915_gem_context {
 	 */
 	struct i915_address_space __rcu *vm;
 
-	/**
-	 * @pid: process id of creator
-	 *
-	 * Note that who created the context may not be the principle user,
-	 * as the context may be shared across a local socket. However,
-	 * that should only affect the default context, all contexts created
-	 * explicitly by the client are expected to be isolated.
-	 */
-	struct pid *pid;
-
 	/** link: place with &drm_i915_private.context_list */
 	struct list_head link;
 
+	/** client: struct i915_drm_client */
+	struct i915_drm_client *client;
+
 	/**
 	 * @ref: reference count
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 99ca242ec13b..dc9eb6823270 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1235,7 +1235,9 @@ static void record_request(const struct i915_request *request,
 
 		ctx = rcu_dereference(request->context->gem_context);
 		if (ctx)
-			erq->pid = pid_nr(ctx->pid);
+			erq->pid = I915_SELFTEST_ONLY(!ctx->client) ?
+				   0 :
+				   pid_nr(i915_drm_client_pid(ctx->client));
 	}
 	rcu_read_unlock();
 }
@@ -1256,23 +1258,25 @@ static bool record_context(struct i915_gem_context_coredump *e,
 			   const struct i915_request *rq)
 {
 	struct i915_gem_context *ctx;
-	struct task_struct *task;
 	bool simulated;
 
 	rcu_read_lock();
+
 	ctx = rcu_dereference(rq->context->gem_context);
 	if (ctx && !kref_get_unless_zero(&ctx->ref))
 		ctx = NULL;
-	rcu_read_unlock();
-	if (!ctx)
+	if (!ctx) {
+		rcu_read_unlock();
 		return true;
+	}
 
-	rcu_read_lock();
-	task = pid_task(ctx->pid, PIDTYPE_PID);
-	if (task) {
-		strcpy(e->comm, task->comm);
-		e->pid = task->pid;
+	if (I915_SELFTEST_ONLY(!ctx->client)) {
+		strcpy(e->comm, "[kernel]");
+	} else {
+		strcpy(e->comm, i915_drm_client_name(ctx->client));
+		e->pid = pid_nr(i915_drm_client_pid(ctx->client));
 	}
+
 	rcu_read_unlock();
 
 	e->sched_attr = ctx->sched;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 3/7] drm/i915: Make GEM contexts track DRM clients
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

If we make GEM contexts keep a reference to i915_drm_client for the whole
of their lifetime, we can consolidate the current task pid and name usage
by getting it from the client.

v2: Don't bother supporting selftests contexts from debugfs. (Chris)
v3 (Lucas): Finish constructing ctx before adding it to the list
v4 (Ram): Rebase on upstream

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-4-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-4-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 20 ++++++++++++-----
 .../gpu/drm/i915/gem/i915_gem_context_types.h | 13 +++--------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 22 +++++++++++--------
 3 files changed, 30 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index e5f8d94666e8..5ea42d5b0b1a 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -345,13 +345,14 @@ void i915_gem_context_release(struct kref *ref)
 	trace_i915_context_free(ctx);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
-	mutex_destroy(&ctx->engines_mutex);
-	mutex_destroy(&ctx->lut_mutex);
+	if (ctx->client)
+		i915_drm_client_put(ctx->client);
 
 	if (ctx->timeline)
 		intel_timeline_put(ctx->timeline);
 
-	put_pid(ctx->pid);
+	mutex_destroy(&ctx->engines_mutex);
+	mutex_destroy(&ctx->lut_mutex);
 	mutex_destroy(&ctx->mutex);
 
 	kfree_rcu(ctx, rcu);
@@ -895,6 +896,7 @@ static int gem_context_register(struct i915_gem_context *ctx,
 				u32 *id)
 {
 	struct drm_i915_private *i915 = ctx->i915;
+	struct i915_drm_client *client;
 	struct i915_address_space *vm;
 	int ret;
 
@@ -906,15 +908,21 @@ static int gem_context_register(struct i915_gem_context *ctx,
 		WRITE_ONCE(vm->file, fpriv); /* XXX */
 	mutex_unlock(&ctx->mutex);
 
-	ctx->pid = get_task_pid(current, PIDTYPE_PID);
+	client = i915_drm_client_get(fpriv->client);
+
+	rcu_read_lock();
 	snprintf(ctx->name, sizeof(ctx->name), "%s[%d]",
-		 current->comm, pid_nr(ctx->pid));
+		 i915_drm_client_name(client),
+		 pid_nr(i915_drm_client_pid(client)));
+	rcu_read_unlock();
 
 	/* And finally expose ourselves to userspace via the idr */
 	ret = xa_alloc(&fpriv->context_xa, id, ctx, xa_limit_32b, GFP_KERNEL);
 	if (ret)
 		goto err_pid;
 
+	ctx->client = client;
+
 	spin_lock(&i915->gem.contexts.lock);
 	list_add_tail(&ctx->link, &i915->gem.contexts.list);
 	spin_unlock(&i915->gem.contexts.lock);
@@ -922,7 +930,7 @@ static int gem_context_register(struct i915_gem_context *ctx,
 	return 0;
 
 err_pid:
-	put_pid(fetch_and_zero(&ctx->pid));
+	i915_drm_client_put(client);
 	return ret;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index 340473aa70de..eb098f2896c5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -96,19 +96,12 @@ struct i915_gem_context {
 	 */
 	struct i915_address_space __rcu *vm;
 
-	/**
-	 * @pid: process id of creator
-	 *
-	 * Note that who created the context may not be the principle user,
-	 * as the context may be shared across a local socket. However,
-	 * that should only affect the default context, all contexts created
-	 * explicitly by the client are expected to be isolated.
-	 */
-	struct pid *pid;
-
 	/** link: place with &drm_i915_private.context_list */
 	struct list_head link;
 
+	/** client: struct i915_drm_client */
+	struct i915_drm_client *client;
+
 	/**
 	 * @ref: reference count
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 99ca242ec13b..dc9eb6823270 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1235,7 +1235,9 @@ static void record_request(const struct i915_request *request,
 
 		ctx = rcu_dereference(request->context->gem_context);
 		if (ctx)
-			erq->pid = pid_nr(ctx->pid);
+			erq->pid = I915_SELFTEST_ONLY(!ctx->client) ?
+				   0 :
+				   pid_nr(i915_drm_client_pid(ctx->client));
 	}
 	rcu_read_unlock();
 }
@@ -1256,23 +1258,25 @@ static bool record_context(struct i915_gem_context_coredump *e,
 			   const struct i915_request *rq)
 {
 	struct i915_gem_context *ctx;
-	struct task_struct *task;
 	bool simulated;
 
 	rcu_read_lock();
+
 	ctx = rcu_dereference(rq->context->gem_context);
 	if (ctx && !kref_get_unless_zero(&ctx->ref))
 		ctx = NULL;
-	rcu_read_unlock();
-	if (!ctx)
+	if (!ctx) {
+		rcu_read_unlock();
 		return true;
+	}
 
-	rcu_read_lock();
-	task = pid_task(ctx->pid, PIDTYPE_PID);
-	if (task) {
-		strcpy(e->comm, task->comm);
-		e->pid = task->pid;
+	if (I915_SELFTEST_ONLY(!ctx->client)) {
+		strcpy(e->comm, "[kernel]");
+	} else {
+		strcpy(e->comm, i915_drm_client_name(ctx->client));
+		e->pid = pid_nr(i915_drm_client_pid(ctx->client));
 	}
+
 	rcu_read_unlock();
 
 	e->sched_attr = ctx->sched;
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 4/7] drm/i915: Track runtime spent in closed and unreachable GEM contexts
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As contexts are abandoned we want to remember how much GPU time they used
(per class) so later we can used it for smarter purposes.

As GEM contexts are closed we want to have the DRM client remember how
much GPU time they used (per class) so later we can used it for smarter
purposes.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-5-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-5-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++++++++++++++++++--
 drivers/gpu/drm/i915/i915_drm_client.h      |  7 ++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5ea42d5b0b1a..b8d8366a2cce 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -262,23 +262,43 @@ static void free_engines_rcu(struct rcu_head *rcu)
 	free_engines(engines);
 }
 
+static void accumulate_runtime(struct i915_drm_client *client,
+			       struct i915_gem_engines *engines)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+
+	if (!client)
+		return;
+
+	/* Transfer accumulated runtime to the parent GEM context. */
+	for_each_gem_engine(ce, engines, it) {
+		unsigned int class = ce->engine->uabi_class;
+
+		GEM_BUG_ON(class >= ARRAY_SIZE(client->past_runtime));
+		atomic64_add(intel_context_get_total_runtime_ns(ce),
+			     &client->past_runtime[class]);
+	}
+}
+
 static int __i915_sw_fence_call
 engines_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 {
 	struct i915_gem_engines *engines =
 		container_of(fence, typeof(*engines), fence);
+	struct i915_gem_context *ctx = engines->ctx;
 
 	switch (state) {
 	case FENCE_COMPLETE:
 		if (!list_empty(&engines->link)) {
-			struct i915_gem_context *ctx = engines->ctx;
 			unsigned long flags;
 
 			spin_lock_irqsave(&ctx->stale.lock, flags);
 			list_del(&engines->link);
 			spin_unlock_irqrestore(&ctx->stale.lock, flags);
 		}
-		i915_gem_context_put(engines->ctx);
+		accumulate_runtime(ctx->client, engines);
+		i915_gem_context_put(ctx);
 		break;
 
 	case FENCE_FREE:
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 556a59d6b834..6f25e754e978 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -15,6 +15,8 @@
 #include <linux/sched.h>
 #include <linux/xarray.h>
 
+#include "gt/intel_engine_types.h"
+
 struct drm_i915_private;
 
 struct i915_drm_clients {
@@ -51,6 +53,11 @@ struct i915_drm_client {
 		struct device_attribute pid;
 		struct device_attribute name;
 	} attr;
+
+	/**
+	 * @past_runtime: Accumulation of pphwsp runtimes from closed contexts.
+	 */
+	atomic64_t past_runtime[MAX_ENGINE_CLASS + 1];
 };
 
 void i915_drm_clients_init(struct i915_drm_clients *clients,
-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 4/7] drm/i915: Track runtime spent in closed and unreachable GEM contexts
@ 2021-05-13 10:59   ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 10:59 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

As contexts are abandoned we want to remember how much GPU time they used
(per class) so later we can used it for smarter purposes.

As GEM contexts are closed we want to have the DRM client remember how
much GPU time they used (per class) so later we can used it for smarter
purposes.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-5-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-5-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c | 24 +++++++++++++++++++--
 drivers/gpu/drm/i915/i915_drm_client.h      |  7 ++++++
 2 files changed, 29 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 5ea42d5b0b1a..b8d8366a2cce 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -262,23 +262,43 @@ static void free_engines_rcu(struct rcu_head *rcu)
 	free_engines(engines);
 }
 
+static void accumulate_runtime(struct i915_drm_client *client,
+			       struct i915_gem_engines *engines)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+
+	if (!client)
+		return;
+
+	/* Transfer accumulated runtime to the parent GEM context. */
+	for_each_gem_engine(ce, engines, it) {
+		unsigned int class = ce->engine->uabi_class;
+
+		GEM_BUG_ON(class >= ARRAY_SIZE(client->past_runtime));
+		atomic64_add(intel_context_get_total_runtime_ns(ce),
+			     &client->past_runtime[class]);
+	}
+}
+
 static int __i915_sw_fence_call
 engines_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
 {
 	struct i915_gem_engines *engines =
 		container_of(fence, typeof(*engines), fence);
+	struct i915_gem_context *ctx = engines->ctx;
 
 	switch (state) {
 	case FENCE_COMPLETE:
 		if (!list_empty(&engines->link)) {
-			struct i915_gem_context *ctx = engines->ctx;
 			unsigned long flags;
 
 			spin_lock_irqsave(&ctx->stale.lock, flags);
 			list_del(&engines->link);
 			spin_unlock_irqrestore(&ctx->stale.lock, flags);
 		}
-		i915_gem_context_put(engines->ctx);
+		accumulate_runtime(ctx->client, engines);
+		i915_gem_context_put(ctx);
 		break;
 
 	case FENCE_FREE:
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 556a59d6b834..6f25e754e978 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -15,6 +15,8 @@
 #include <linux/sched.h>
 #include <linux/xarray.h>
 
+#include "gt/intel_engine_types.h"
+
 struct drm_i915_private;
 
 struct i915_drm_clients {
@@ -51,6 +53,11 @@ struct i915_drm_client {
 		struct device_attribute pid;
 		struct device_attribute name;
 	} attr;
+
+	/**
+	 * @past_runtime: Accumulation of pphwsp runtimes from closed contexts.
+	 */
+	atomic64_t past_runtime[MAX_ENGINE_CLASS + 1];
 };
 
 void i915_drm_clients_init(struct i915_drm_clients *clients,
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 5/7] drm/i915: Track all user contexts per client
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 11:00   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 11:00 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We soon want to start answering questions like how much GPU time is the
context belonging to a client which exited still using.

To enable this we start tracking all context belonging to a client on a
separate list.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-6-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-6-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c       | 12 ++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h |  3 +++
 drivers/gpu/drm/i915/i915_drm_client.c            |  3 +++
 drivers/gpu/drm/i915/i915_drm_client.h            |  5 +++++
 4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index b8d8366a2cce..1595a608de92 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -573,6 +573,7 @@ static void set_closed_name(struct i915_gem_context *ctx)
 static void context_close(struct i915_gem_context *ctx)
 {
 	struct i915_address_space *vm;
+	struct i915_drm_client *client;
 
 	/* Flush any concurrent set_engines() */
 	mutex_lock(&ctx->engines_mutex);
@@ -601,6 +602,13 @@ static void context_close(struct i915_gem_context *ctx)
 	list_del(&ctx->link);
 	spin_unlock(&ctx->i915->gem.contexts.lock);
 
+	client = ctx->client;
+	if (client) {
+		spin_lock(&client->ctx_lock);
+		list_del_rcu(&ctx->client_link);
+		spin_unlock(&client->ctx_lock);
+	}
+
 	mutex_unlock(&ctx->mutex);
 
 	/*
@@ -943,6 +951,10 @@ static int gem_context_register(struct i915_gem_context *ctx,
 
 	ctx->client = client;
 
+	spin_lock(&client->ctx_lock);
+	list_add_tail_rcu(&ctx->client_link, &client->ctx_list);
+	spin_unlock(&client->ctx_lock);
+
 	spin_lock(&i915->gem.contexts.lock);
 	list_add_tail(&ctx->link, &i915->gem.contexts.list);
 	spin_unlock(&i915->gem.contexts.lock);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index eb098f2896c5..8ea3fe3e7414 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -102,6 +102,9 @@ struct i915_gem_context {
 	/** client: struct i915_drm_client */
 	struct i915_drm_client *client;
 
+	/** link: &drm_client.context_list */
+	struct list_head client_link;
+
 	/**
 	 * @ref: reference count
 	 *
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index ad3d36c9dee2..0ca81a750895 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -196,6 +196,9 @@ i915_drm_client_add(struct i915_drm_clients *clients, struct task_struct *task)
 
 	kref_init(&client->kref);
 	mutex_init(&client->update_lock);
+	spin_lock_init(&client->ctx_lock);
+	INIT_LIST_HEAD(&client->ctx_list);
+
 	client->clients = clients;
 	INIT_RCU_WORK(&client->rcu, __rcu_i915_drm_client_free);
 
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 6f25e754e978..13f92142e474 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -9,10 +9,12 @@
 #include <linux/device.h>
 #include <linux/kobject.h>
 #include <linux/kref.h>
+#include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/pid.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
+#include <linux/spinlock.h>
 #include <linux/xarray.h>
 
 #include "gt/intel_engine_types.h"
@@ -46,6 +48,9 @@ struct i915_drm_client {
 	struct i915_drm_client_name __rcu *name;
 	bool closed;
 
+	spinlock_t ctx_lock; /* For add/remove from ctx_list. */
+	struct list_head ctx_list; /* List of contexts belonging to client. */
+
 	struct i915_drm_clients *clients;
 
 	struct kobject *root;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 5/7] drm/i915: Track all user contexts per client
@ 2021-05-13 11:00   ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 11:00 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We soon want to start answering questions like how much GPU time is the
context belonging to a client which exited still using.

To enable this we start tracking all context belonging to a client on a
separate list.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-6-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-6-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c       | 12 ++++++++++++
 drivers/gpu/drm/i915/gem/i915_gem_context_types.h |  3 +++
 drivers/gpu/drm/i915/i915_drm_client.c            |  3 +++
 drivers/gpu/drm/i915/i915_drm_client.h            |  5 +++++
 4 files changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index b8d8366a2cce..1595a608de92 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -573,6 +573,7 @@ static void set_closed_name(struct i915_gem_context *ctx)
 static void context_close(struct i915_gem_context *ctx)
 {
 	struct i915_address_space *vm;
+	struct i915_drm_client *client;
 
 	/* Flush any concurrent set_engines() */
 	mutex_lock(&ctx->engines_mutex);
@@ -601,6 +602,13 @@ static void context_close(struct i915_gem_context *ctx)
 	list_del(&ctx->link);
 	spin_unlock(&ctx->i915->gem.contexts.lock);
 
+	client = ctx->client;
+	if (client) {
+		spin_lock(&client->ctx_lock);
+		list_del_rcu(&ctx->client_link);
+		spin_unlock(&client->ctx_lock);
+	}
+
 	mutex_unlock(&ctx->mutex);
 
 	/*
@@ -943,6 +951,10 @@ static int gem_context_register(struct i915_gem_context *ctx,
 
 	ctx->client = client;
 
+	spin_lock(&client->ctx_lock);
+	list_add_tail_rcu(&ctx->client_link, &client->ctx_list);
+	spin_unlock(&client->ctx_lock);
+
 	spin_lock(&i915->gem.contexts.lock);
 	list_add_tail(&ctx->link, &i915->gem.contexts.list);
 	spin_unlock(&i915->gem.contexts.lock);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index eb098f2896c5..8ea3fe3e7414 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -102,6 +102,9 @@ struct i915_gem_context {
 	/** client: struct i915_drm_client */
 	struct i915_drm_client *client;
 
+	/** link: &drm_client.context_list */
+	struct list_head client_link;
+
 	/**
 	 * @ref: reference count
 	 *
diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index ad3d36c9dee2..0ca81a750895 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -196,6 +196,9 @@ i915_drm_client_add(struct i915_drm_clients *clients, struct task_struct *task)
 
 	kref_init(&client->kref);
 	mutex_init(&client->update_lock);
+	spin_lock_init(&client->ctx_lock);
+	INIT_LIST_HEAD(&client->ctx_list);
+
 	client->clients = clients;
 	INIT_RCU_WORK(&client->rcu, __rcu_i915_drm_client_free);
 
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 6f25e754e978..13f92142e474 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -9,10 +9,12 @@
 #include <linux/device.h>
 #include <linux/kobject.h>
 #include <linux/kref.h>
+#include <linux/list.h>
 #include <linux/mutex.h>
 #include <linux/pid.h>
 #include <linux/rcupdate.h>
 #include <linux/sched.h>
+#include <linux/spinlock.h>
 #include <linux/xarray.h>
 
 #include "gt/intel_engine_types.h"
@@ -46,6 +48,9 @@ struct i915_drm_client {
 	struct i915_drm_client_name __rcu *name;
 	bool closed;
 
+	spinlock_t ctx_lock; /* For add/remove from ctx_list. */
+	struct list_head ctx_list; /* List of contexts belonging to client. */
+
 	struct i915_drm_clients *clients;
 
 	struct kobject *root;
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 6/7] drm/i915: Track context current active time
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 11:00   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 11:00 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Track context active (on hardware) status together with the start
timestamp.

This will be used to provide better granularity of context
runtime reporting in conjunction with already tracked pphwsp accumulated
runtime.

The latter is only updated on context save so does not give us visibility
to any currently executing work.

As part of the patch the existing runtime tracking data is moved under the
new ce->stats member and updated under the seqlock. This provides the
ability to atomically read out accumulated plus active runtime.

v2:
 * Rename and make __intel_context_get_active_time unlocked.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> #  v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-7-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-7-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 27 ++++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_context.h       | 15 ++++-------
 drivers/gpu/drm/i915/gt/intel_context_types.h | 24 +++++++++++------
 .../drm/i915/gt/intel_execlists_submission.c  | 23 ++++++++++++----
 .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |  4 +++
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 27 ++++++++++---------
 drivers/gpu/drm/i915/gt/intel_lrc.h           | 24 +++++++++++++++++
 drivers/gpu/drm/i915/gt/selftest_lrc.c        | 10 +++----
 drivers/gpu/drm/i915/i915_gpu_error.c         |  9 +++----
 drivers/gpu/drm/i915/i915_gpu_error.h         |  2 +-
 10 files changed, 116 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 4033184f13b9..bc021244c3b2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -373,7 +373,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	ce->sseu = engine->sseu;
 	ce->ring = __intel_context_ring_size(SZ_4K);
 
-	ewma_runtime_init(&ce->runtime.avg);
+	ewma_runtime_init(&ce->stats.runtime.avg);
 
 	ce->vm = i915_vm_get(engine->gt->vm);
 
@@ -499,6 +499,31 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
 	return rq;
 }
 
+u64 intel_context_get_total_runtime_ns(const struct intel_context *ce)
+{
+	u64 total, active;
+
+	total = ce->stats.runtime.total;
+	if (ce->ops->flags & COPS_RUNTIME_CYCLES)
+		total *= ce->engine->gt->clock_period_ns;
+
+	active = READ_ONCE(ce->stats.active);
+	if (active)
+		active = intel_context_clock() - active;
+
+	return total + active;
+}
+
+u64 intel_context_get_avg_runtime_ns(struct intel_context *ce)
+{
+	u64 avg = ewma_runtime_read(&ce->stats.runtime.avg);
+
+	if (ce->ops->flags & COPS_RUNTIME_CYCLES)
+		avg *= ce->engine->gt->clock_period_ns;
+
+	return avg;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index f83a73a2b39f..a9125768b1b4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -250,18 +250,13 @@ intel_context_clear_nopreempt(struct intel_context *ce)
 	clear_bit(CONTEXT_NOPREEMPT, &ce->flags);
 }
 
-static inline u64 intel_context_get_total_runtime_ns(struct intel_context *ce)
-{
-	const u32 period = ce->engine->gt->clock_period_ns;
-
-	return READ_ONCE(ce->runtime.total) * period;
-}
+u64 intel_context_get_total_runtime_ns(const struct intel_context *ce);
+u64 intel_context_get_avg_runtime_ns(struct intel_context *ce);
 
-static inline u64 intel_context_get_avg_runtime_ns(struct intel_context *ce)
+static inline u64 intel_context_clock(void)
 {
-	const u32 period = ce->engine->gt->clock_period_ns;
-
-	return mul_u32_u32(ewma_runtime_read(&ce->runtime.avg), period);
+	/* As we mix CS cycles with CPU clocks, use the raw monotonic clock. */
+	return ktime_get_raw_fast_ns();
 }
 
 #endif /* __INTEL_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed8c447a7346..65a5730a4f5b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -33,6 +33,9 @@ struct intel_context_ops {
 #define COPS_HAS_INFLIGHT_BIT 0
 #define COPS_HAS_INFLIGHT BIT(COPS_HAS_INFLIGHT_BIT)
 
+#define COPS_RUNTIME_CYCLES_BIT 1
+#define COPS_RUNTIME_CYCLES BIT(COPS_RUNTIME_CYCLES_BIT)
+
 	int (*alloc)(struct intel_context *ce);
 
 	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
@@ -110,14 +113,19 @@ struct intel_context {
 	} lrc;
 	u32 tag; /* cookie passed to HW to track this context on submission */
 
-	/* Time on GPU as tracked by the hw. */
-	struct {
-		struct ewma_runtime avg;
-		u64 total;
-		u32 last;
-		I915_SELFTEST_DECLARE(u32 num_underflow);
-		I915_SELFTEST_DECLARE(u32 max_underflow);
-	} runtime;
+	/** stats: Context GPU engine busyness tracking. */
+	struct intel_context_stats {
+		u64 active;
+
+		/* Time on GPU as tracked by the hw. */
+		struct {
+			struct ewma_runtime avg;
+			u64 total;
+			u32 last;
+			I915_SELFTEST_DECLARE(u32 num_underflow);
+			I915_SELFTEST_DECLARE(u32 max_underflow);
+		} runtime;
+	} stats;
 
 	unsigned int active_count; /* protected by timeline->mutex */
 
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de124870af44..18d9c1d96d36 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -605,8 +605,6 @@ static void __execlists_schedule_out(struct i915_request * const rq,
 		GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag));
 		__set_bit(ccid - 1, &engine->context_tag);
 	}
-
-	lrc_update_runtime(ce);
 	intel_engine_context_out(engine);
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 	if (engine->fw_domain && !--engine->fw_active)
@@ -1955,8 +1953,23 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
 	 * and merits a fresh timeslice. We reinstall the timer after
 	 * inspecting the queue to see if we need to resumbit.
 	 */
-	if (*prev != *execlists->active) /* elide lite-restores */
+	if (*prev != *execlists->active) { /* elide lite-restores */
+		/*
+		 * Note the inherent discrepancy between the HW runtime,
+		 * recorded as part of the context switch, and the CPU
+		 * adjustment for active contexts. We have to hope that
+		 * the delay in processing the CS event is very small
+		 * and consistent. It works to our advantage to have
+		 * the CPU adjustment _undershoot_ (i.e. start later than)
+		 * the CS timestamp so we never overreport the runtime
+		 * and correct overselves later when updating from HW.
+		 */
+		if (*prev)
+			lrc_runtime_stop((*prev)->context);
+		if (*execlists->active)
+			lrc_runtime_start((*execlists->active)->context);
 		new_timeslice(execlists);
+	}
 
 	return inactive;
 }
@@ -2495,7 +2508,7 @@ static int execlists_context_alloc(struct intel_context *ce)
 }
 
 static const struct intel_context_ops execlists_context_ops = {
-	.flags = COPS_HAS_INFLIGHT,
+	.flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES,
 
 	.alloc = execlists_context_alloc,
 
@@ -3401,7 +3414,7 @@ static void virtual_context_exit(struct intel_context *ce)
 }
 
 static const struct intel_context_ops virtual_context_ops = {
-	.flags = COPS_HAS_INFLIGHT,
+	.flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES,
 
 	.alloc = virtual_context_alloc,
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
index 582fcaee11aa..f8c79efb1a87 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
@@ -159,6 +159,10 @@ void intel_gt_init_clock_frequency(struct intel_gt *gt)
 	if (gt->clock_frequency)
 		gt->clock_period_ns = intel_gt_clock_interval_to_ns(gt, 1);
 
+	/* Icelake appears to use another fixed frequency for CTX_TIMESTAMP */
+	if (IS_GEN(gt->i915, 11))
+		gt->clock_period_ns = NSEC_PER_SEC / 13750000;
+
 	GT_TRACE(gt,
 		 "Using clock frequency: %dkHz, period: %dns, wrap: %lldms\n",
 		 gt->clock_frequency / 1000,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index aafe2a4df496..c145e4723279 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -642,7 +642,7 @@ static void init_common_regs(u32 * const regs,
 					   CTX_CTRL_RS_CTX_ENABLE);
 	regs[CTX_CONTEXT_CONTROL] = ctl;
 
-	regs[CTX_TIMESTAMP] = ce->runtime.last;
+	regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
 }
 
 static void init_wa_bb_regs(u32 * const regs,
@@ -1565,35 +1565,36 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine)
 	}
 }
 
-static void st_update_runtime_underflow(struct intel_context *ce, s32 dt)
+static void st_runtime_underflow(struct intel_context_stats *stats, s32 dt)
 {
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
-	ce->runtime.num_underflow++;
-	ce->runtime.max_underflow = max_t(u32, ce->runtime.max_underflow, -dt);
+	stats->runtime.num_underflow++;
+	stats->runtime.max_underflow =
+		max_t(u32, stats->runtime.max_underflow, -dt);
 #endif
 }
 
 void lrc_update_runtime(struct intel_context *ce)
 {
+	struct intel_context_stats *stats = &ce->stats;
 	u32 old;
 	s32 dt;
 
-	if (intel_context_is_barrier(ce))
+	old = stats->runtime.last;
+	stats->runtime.last = lrc_get_runtime(ce);
+	dt = stats->runtime.last - old;
+	if (!dt)
 		return;
 
-	old = ce->runtime.last;
-	ce->runtime.last = lrc_get_runtime(ce);
-	dt = ce->runtime.last - old;
-
 	if (unlikely(dt < 0)) {
 		CE_TRACE(ce, "runtime underflow: last=%u, new=%u, delta=%d\n",
-			 old, ce->runtime.last, dt);
-		st_update_runtime_underflow(ce, dt);
+			 old, stats->runtime.last, dt);
+		st_runtime_underflow(stats, dt);
 		return;
 	}
 
-	ewma_runtime_add(&ce->runtime.avg, dt);
-	ce->runtime.total += dt;
+	ewma_runtime_add(&stats->runtime.avg, dt);
+	stats->runtime.total += dt;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index 7f697845c4cf..8073674538d7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -79,4 +79,28 @@ static inline u32 lrc_get_runtime(const struct intel_context *ce)
 	return READ_ONCE(ce->lrc_reg_state[CTX_TIMESTAMP]);
 }
 
+static inline void lrc_runtime_start(struct intel_context *ce)
+{
+	struct intel_context_stats *stats = &ce->stats;
+
+	if (intel_context_is_barrier(ce))
+		return;
+
+	if (stats->active)
+		return;
+
+	WRITE_ONCE(stats->active, intel_context_clock());
+}
+
+static inline void lrc_runtime_stop(struct intel_context *ce)
+{
+	struct intel_context_stats *stats = &ce->stats;
+
+	if (!stats->active)
+		return;
+
+	lrc_update_runtime(ce);
+	WRITE_ONCE(stats->active, 0);
+}
+
 #endif /* __INTEL_LRC_H__ */
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index d8f6623524e8..d2f950a5d9b5 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -1751,8 +1751,8 @@ static int __live_pphwsp_runtime(struct intel_engine_cs *engine)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
-	ce->runtime.num_underflow = 0;
-	ce->runtime.max_underflow = 0;
+	ce->stats.runtime.num_underflow = 0;
+	ce->stats.runtime.max_underflow = 0;
 
 	do {
 		unsigned int loop = 1024;
@@ -1790,11 +1790,11 @@ static int __live_pphwsp_runtime(struct intel_engine_cs *engine)
 		intel_context_get_avg_runtime_ns(ce));
 
 	err = 0;
-	if (ce->runtime.num_underflow) {
+	if (ce->stats.runtime.num_underflow) {
 		pr_err("%s: pphwsp underflow %u time(s), max %u cycles!\n",
 		       engine->name,
-		       ce->runtime.num_underflow,
-		       ce->runtime.max_underflow);
+		       ce->stats.runtime.num_underflow,
+		       ce->stats.runtime.max_underflow);
 		GEM_TRACE_DUMP();
 		err = -EOVERFLOW;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index dc9eb6823270..a4bfcac0f5df 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -484,13 +484,10 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
 				const char *header,
 				const struct i915_gem_context_coredump *ctx)
 {
-	const u32 period = m->i915->gt.clock_period_ns;
-
 	err_printf(m, "%s%s[%d] prio %d, guilty %d active %d, runtime total %lluns, avg %lluns\n",
 		   header, ctx->comm, ctx->pid, ctx->sched_attr.priority,
 		   ctx->guilty, ctx->active,
-		   ctx->total_runtime * period,
-		   mul_u32_u32(ctx->avg_runtime, period));
+		   ctx->total_runtime, ctx->avg_runtime);
 }
 
 static struct i915_vma_coredump *
@@ -1283,8 +1280,8 @@ static bool record_context(struct i915_gem_context_coredump *e,
 	e->guilty = atomic_read(&ctx->guilty_count);
 	e->active = atomic_read(&ctx->active_count);
 
-	e->total_runtime = rq->context->runtime.total;
-	e->avg_runtime = ewma_runtime_read(&rq->context->runtime.avg);
+	e->total_runtime = intel_context_get_total_runtime_ns(rq->context);
+	e->avg_runtime = intel_context_get_avg_runtime_ns(rq->context);
 
 	simulated = i915_gem_context_no_error_capture(ctx);
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index eb435f9e0220..4b4af93217d4 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -90,7 +90,7 @@ struct intel_engine_coredump {
 		char comm[TASK_COMM_LEN];
 
 		u64 total_runtime;
-		u32 avg_runtime;
+		u64 avg_runtime;
 
 		pid_t pid;
 		int active;
-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 6/7] drm/i915: Track context current active time
@ 2021-05-13 11:00   ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 11:00 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Track context active (on hardware) status together with the start
timestamp.

This will be used to provide better granularity of context
runtime reporting in conjunction with already tracked pphwsp accumulated
runtime.

The latter is only updated on context save so does not give us visibility
to any currently executing work.

As part of the patch the existing runtime tracking data is moved under the
new ce->stats member and updated under the seqlock. This provides the
ability to atomically read out accumulated plus active runtime.

v2:
 * Rename and make __intel_context_get_active_time unlocked.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com> #  v1
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-7-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-7-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 27 ++++++++++++++++++-
 drivers/gpu/drm/i915/gt/intel_context.h       | 15 ++++-------
 drivers/gpu/drm/i915/gt/intel_context_types.h | 24 +++++++++++------
 .../drm/i915/gt/intel_execlists_submission.c  | 23 ++++++++++++----
 .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |  4 +++
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 27 ++++++++++---------
 drivers/gpu/drm/i915/gt/intel_lrc.h           | 24 +++++++++++++++++
 drivers/gpu/drm/i915/gt/selftest_lrc.c        | 10 +++----
 drivers/gpu/drm/i915/i915_gpu_error.c         |  9 +++----
 drivers/gpu/drm/i915/i915_gpu_error.h         |  2 +-
 10 files changed, 116 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 4033184f13b9..bc021244c3b2 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -373,7 +373,7 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 	ce->sseu = engine->sseu;
 	ce->ring = __intel_context_ring_size(SZ_4K);
 
-	ewma_runtime_init(&ce->runtime.avg);
+	ewma_runtime_init(&ce->stats.runtime.avg);
 
 	ce->vm = i915_vm_get(engine->gt->vm);
 
@@ -499,6 +499,31 @@ struct i915_request *intel_context_create_request(struct intel_context *ce)
 	return rq;
 }
 
+u64 intel_context_get_total_runtime_ns(const struct intel_context *ce)
+{
+	u64 total, active;
+
+	total = ce->stats.runtime.total;
+	if (ce->ops->flags & COPS_RUNTIME_CYCLES)
+		total *= ce->engine->gt->clock_period_ns;
+
+	active = READ_ONCE(ce->stats.active);
+	if (active)
+		active = intel_context_clock() - active;
+
+	return total + active;
+}
+
+u64 intel_context_get_avg_runtime_ns(struct intel_context *ce)
+{
+	u64 avg = ewma_runtime_read(&ce->stats.runtime.avg);
+
+	if (ce->ops->flags & COPS_RUNTIME_CYCLES)
+		avg *= ce->engine->gt->clock_period_ns;
+
+	return avg;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_context.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_context.h b/drivers/gpu/drm/i915/gt/intel_context.h
index f83a73a2b39f..a9125768b1b4 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.h
+++ b/drivers/gpu/drm/i915/gt/intel_context.h
@@ -250,18 +250,13 @@ intel_context_clear_nopreempt(struct intel_context *ce)
 	clear_bit(CONTEXT_NOPREEMPT, &ce->flags);
 }
 
-static inline u64 intel_context_get_total_runtime_ns(struct intel_context *ce)
-{
-	const u32 period = ce->engine->gt->clock_period_ns;
-
-	return READ_ONCE(ce->runtime.total) * period;
-}
+u64 intel_context_get_total_runtime_ns(const struct intel_context *ce);
+u64 intel_context_get_avg_runtime_ns(struct intel_context *ce);
 
-static inline u64 intel_context_get_avg_runtime_ns(struct intel_context *ce)
+static inline u64 intel_context_clock(void)
 {
-	const u32 period = ce->engine->gt->clock_period_ns;
-
-	return mul_u32_u32(ewma_runtime_read(&ce->runtime.avg), period);
+	/* As we mix CS cycles with CPU clocks, use the raw monotonic clock. */
+	return ktime_get_raw_fast_ns();
 }
 
 #endif /* __INTEL_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index ed8c447a7346..65a5730a4f5b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -33,6 +33,9 @@ struct intel_context_ops {
 #define COPS_HAS_INFLIGHT_BIT 0
 #define COPS_HAS_INFLIGHT BIT(COPS_HAS_INFLIGHT_BIT)
 
+#define COPS_RUNTIME_CYCLES_BIT 1
+#define COPS_RUNTIME_CYCLES BIT(COPS_RUNTIME_CYCLES_BIT)
+
 	int (*alloc)(struct intel_context *ce);
 
 	int (*pre_pin)(struct intel_context *ce, struct i915_gem_ww_ctx *ww, void **vaddr);
@@ -110,14 +113,19 @@ struct intel_context {
 	} lrc;
 	u32 tag; /* cookie passed to HW to track this context on submission */
 
-	/* Time on GPU as tracked by the hw. */
-	struct {
-		struct ewma_runtime avg;
-		u64 total;
-		u32 last;
-		I915_SELFTEST_DECLARE(u32 num_underflow);
-		I915_SELFTEST_DECLARE(u32 max_underflow);
-	} runtime;
+	/** stats: Context GPU engine busyness tracking. */
+	struct intel_context_stats {
+		u64 active;
+
+		/* Time on GPU as tracked by the hw. */
+		struct {
+			struct ewma_runtime avg;
+			u64 total;
+			u32 last;
+			I915_SELFTEST_DECLARE(u32 num_underflow);
+			I915_SELFTEST_DECLARE(u32 max_underflow);
+		} runtime;
+	} stats;
 
 	unsigned int active_count; /* protected by timeline->mutex */
 
diff --git a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
index de124870af44..18d9c1d96d36 100644
--- a/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
+++ b/drivers/gpu/drm/i915/gt/intel_execlists_submission.c
@@ -605,8 +605,6 @@ static void __execlists_schedule_out(struct i915_request * const rq,
 		GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag));
 		__set_bit(ccid - 1, &engine->context_tag);
 	}
-
-	lrc_update_runtime(ce);
 	intel_engine_context_out(engine);
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_OUT);
 	if (engine->fw_domain && !--engine->fw_active)
@@ -1955,8 +1953,23 @@ process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
 	 * and merits a fresh timeslice. We reinstall the timer after
 	 * inspecting the queue to see if we need to resumbit.
 	 */
-	if (*prev != *execlists->active) /* elide lite-restores */
+	if (*prev != *execlists->active) { /* elide lite-restores */
+		/*
+		 * Note the inherent discrepancy between the HW runtime,
+		 * recorded as part of the context switch, and the CPU
+		 * adjustment for active contexts. We have to hope that
+		 * the delay in processing the CS event is very small
+		 * and consistent. It works to our advantage to have
+		 * the CPU adjustment _undershoot_ (i.e. start later than)
+		 * the CS timestamp so we never overreport the runtime
+		 * and correct overselves later when updating from HW.
+		 */
+		if (*prev)
+			lrc_runtime_stop((*prev)->context);
+		if (*execlists->active)
+			lrc_runtime_start((*execlists->active)->context);
 		new_timeslice(execlists);
+	}
 
 	return inactive;
 }
@@ -2495,7 +2508,7 @@ static int execlists_context_alloc(struct intel_context *ce)
 }
 
 static const struct intel_context_ops execlists_context_ops = {
-	.flags = COPS_HAS_INFLIGHT,
+	.flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES,
 
 	.alloc = execlists_context_alloc,
 
@@ -3401,7 +3414,7 @@ static void virtual_context_exit(struct intel_context *ce)
 }
 
 static const struct intel_context_ops virtual_context_ops = {
-	.flags = COPS_HAS_INFLIGHT,
+	.flags = COPS_HAS_INFLIGHT | COPS_RUNTIME_CYCLES,
 
 	.alloc = virtual_context_alloc,
 
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
index 582fcaee11aa..f8c79efb1a87 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c
@@ -159,6 +159,10 @@ void intel_gt_init_clock_frequency(struct intel_gt *gt)
 	if (gt->clock_frequency)
 		gt->clock_period_ns = intel_gt_clock_interval_to_ns(gt, 1);
 
+	/* Icelake appears to use another fixed frequency for CTX_TIMESTAMP */
+	if (IS_GEN(gt->i915, 11))
+		gt->clock_period_ns = NSEC_PER_SEC / 13750000;
+
 	GT_TRACE(gt,
 		 "Using clock frequency: %dkHz, period: %dns, wrap: %lldms\n",
 		 gt->clock_frequency / 1000,
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index aafe2a4df496..c145e4723279 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -642,7 +642,7 @@ static void init_common_regs(u32 * const regs,
 					   CTX_CTRL_RS_CTX_ENABLE);
 	regs[CTX_CONTEXT_CONTROL] = ctl;
 
-	regs[CTX_TIMESTAMP] = ce->runtime.last;
+	regs[CTX_TIMESTAMP] = ce->stats.runtime.last;
 }
 
 static void init_wa_bb_regs(u32 * const regs,
@@ -1565,35 +1565,36 @@ void lrc_init_wa_ctx(struct intel_engine_cs *engine)
 	}
 }
 
-static void st_update_runtime_underflow(struct intel_context *ce, s32 dt)
+static void st_runtime_underflow(struct intel_context_stats *stats, s32 dt)
 {
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
-	ce->runtime.num_underflow++;
-	ce->runtime.max_underflow = max_t(u32, ce->runtime.max_underflow, -dt);
+	stats->runtime.num_underflow++;
+	stats->runtime.max_underflow =
+		max_t(u32, stats->runtime.max_underflow, -dt);
 #endif
 }
 
 void lrc_update_runtime(struct intel_context *ce)
 {
+	struct intel_context_stats *stats = &ce->stats;
 	u32 old;
 	s32 dt;
 
-	if (intel_context_is_barrier(ce))
+	old = stats->runtime.last;
+	stats->runtime.last = lrc_get_runtime(ce);
+	dt = stats->runtime.last - old;
+	if (!dt)
 		return;
 
-	old = ce->runtime.last;
-	ce->runtime.last = lrc_get_runtime(ce);
-	dt = ce->runtime.last - old;
-
 	if (unlikely(dt < 0)) {
 		CE_TRACE(ce, "runtime underflow: last=%u, new=%u, delta=%d\n",
-			 old, ce->runtime.last, dt);
-		st_update_runtime_underflow(ce, dt);
+			 old, stats->runtime.last, dt);
+		st_runtime_underflow(stats, dt);
 		return;
 	}
 
-	ewma_runtime_add(&ce->runtime.avg, dt);
-	ce->runtime.total += dt;
+	ewma_runtime_add(&stats->runtime.avg, dt);
+	stats->runtime.total += dt;
 }
 
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index 7f697845c4cf..8073674538d7 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -79,4 +79,28 @@ static inline u32 lrc_get_runtime(const struct intel_context *ce)
 	return READ_ONCE(ce->lrc_reg_state[CTX_TIMESTAMP]);
 }
 
+static inline void lrc_runtime_start(struct intel_context *ce)
+{
+	struct intel_context_stats *stats = &ce->stats;
+
+	if (intel_context_is_barrier(ce))
+		return;
+
+	if (stats->active)
+		return;
+
+	WRITE_ONCE(stats->active, intel_context_clock());
+}
+
+static inline void lrc_runtime_stop(struct intel_context *ce)
+{
+	struct intel_context_stats *stats = &ce->stats;
+
+	if (!stats->active)
+		return;
+
+	lrc_update_runtime(ce);
+	WRITE_ONCE(stats->active, 0);
+}
+
 #endif /* __INTEL_LRC_H__ */
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index d8f6623524e8..d2f950a5d9b5 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -1751,8 +1751,8 @@ static int __live_pphwsp_runtime(struct intel_engine_cs *engine)
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
-	ce->runtime.num_underflow = 0;
-	ce->runtime.max_underflow = 0;
+	ce->stats.runtime.num_underflow = 0;
+	ce->stats.runtime.max_underflow = 0;
 
 	do {
 		unsigned int loop = 1024;
@@ -1790,11 +1790,11 @@ static int __live_pphwsp_runtime(struct intel_engine_cs *engine)
 		intel_context_get_avg_runtime_ns(ce));
 
 	err = 0;
-	if (ce->runtime.num_underflow) {
+	if (ce->stats.runtime.num_underflow) {
 		pr_err("%s: pphwsp underflow %u time(s), max %u cycles!\n",
 		       engine->name,
-		       ce->runtime.num_underflow,
-		       ce->runtime.max_underflow);
+		       ce->stats.runtime.num_underflow,
+		       ce->stats.runtime.max_underflow);
 		GEM_TRACE_DUMP();
 		err = -EOVERFLOW;
 	}
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index dc9eb6823270..a4bfcac0f5df 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -484,13 +484,10 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
 				const char *header,
 				const struct i915_gem_context_coredump *ctx)
 {
-	const u32 period = m->i915->gt.clock_period_ns;
-
 	err_printf(m, "%s%s[%d] prio %d, guilty %d active %d, runtime total %lluns, avg %lluns\n",
 		   header, ctx->comm, ctx->pid, ctx->sched_attr.priority,
 		   ctx->guilty, ctx->active,
-		   ctx->total_runtime * period,
-		   mul_u32_u32(ctx->avg_runtime, period));
+		   ctx->total_runtime, ctx->avg_runtime);
 }
 
 static struct i915_vma_coredump *
@@ -1283,8 +1280,8 @@ static bool record_context(struct i915_gem_context_coredump *e,
 	e->guilty = atomic_read(&ctx->guilty_count);
 	e->active = atomic_read(&ctx->active_count);
 
-	e->total_runtime = rq->context->runtime.total;
-	e->avg_runtime = ewma_runtime_read(&rq->context->runtime.avg);
+	e->total_runtime = intel_context_get_total_runtime_ns(rq->context);
+	e->avg_runtime = intel_context_get_avg_runtime_ns(rq->context);
 
 	simulated = i915_gem_context_no_error_capture(ctx);
 
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index eb435f9e0220..4b4af93217d4 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -90,7 +90,7 @@ struct intel_engine_coredump {
 		char comm[TASK_COMM_LEN];
 
 		u64 total_runtime;
-		u32 avg_runtime;
+		u64 avg_runtime;
 
 		pid_t pid;
 		int active;
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [PATCH 7/7] drm/i915: Expose per-engine client busyness
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 11:00   ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 11:00 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Expose per-client and per-engine busyness under the previously added sysfs
client root.

The new files are one per-engine instance and located under the 'busy'
directory. Each contains a monotonically increasing nano-second resolution
times each client's jobs were executing on the GPU.

This enables userspace to create a top-like tool for GPU utilization:

==========================================================================
intel-gpu-top -  935/ 935 MHz;    0% RC6; 14.73 Watts;     1097 irqs/s

      IMC reads:     1401 MiB/s
     IMC writes:        4 MiB/s

          ENGINE      BUSY                                 MI_SEMA MI_WAIT
     Render/3D/0   63.73% |███████████████████           |      3%      0%
       Blitter/0    9.53% |██▊                           |      6%      0%
         Video/0   39.32% |███████████▊                  |     16%      0%
         Video/1   15.62% |████▋                         |      0%      0%
  VideoEnhance/0    0.00% |                              |      0%      0%

  PID            NAME     RCS          BCS          VCS         VECS
 4084        gem_wsim |█████▌     ||█          ||           ||           |
 4086        gem_wsim |█▌         ||           ||███        ||           |
==========================================================================

v2: Use intel_context_engine_get_busy_time.
v3: New directory structure.
v4: Rebase.
v5: sysfs_attr_init.
v6: Small tidy in i915_gem_add_client.
v7: Rebase to be engine class based.
v8:
 * Always enable stats.
 * Walk all client contexts.
v9:
 * Skip unsupported engine classes. (Chris)
 * Use scheduler caps. (Chris)
v10:
 * Use pphwsp runtime only.

Link: https://patchwork.freedesktop.org/series/71182/ # intel_gpu_top
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-8-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-8-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_drm_client.c | 101 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drm_client.h |  10 +++
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index 0ca81a750895..1f8b08a413d4 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -9,6 +9,11 @@
 
 #include <drm/drm_print.h>
 
+#include <uapi/drm/i915_drm.h>
+
+#include "gem/i915_gem_context.h"
+#include "gt/intel_engine_user.h"
+
 #include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_gem.h"
@@ -55,6 +60,95 @@ show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
 	return ret;
 }
 
+static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+	u64 total = 0;
+
+	for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) {
+		if (ce->engine->uabi_class != class)
+			continue;
+
+		total += intel_context_get_total_runtime_ns(ce);
+	}
+
+	return total;
+}
+
+static ssize_t
+show_busy(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct i915_engine_busy_attribute *i915_attr =
+		container_of(attr, typeof(*i915_attr), attr);
+	unsigned int class = i915_attr->engine_class;
+	const struct i915_drm_client *client = i915_attr->client;
+	const struct list_head *list = &client->ctx_list;
+	u64 total = atomic64_read(&client->past_runtime[class]);
+	struct i915_gem_context *ctx;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ctx, list, client_link)
+		total += busy_add(ctx, class);
+	rcu_read_unlock();
+
+	return sysfs_emit(buf, "%llu\n", total);
+}
+
+static const char * const uabi_class_names[] = {
+	[I915_ENGINE_CLASS_RENDER] = "0",
+	[I915_ENGINE_CLASS_COPY] = "1",
+	[I915_ENGINE_CLASS_VIDEO] = "2",
+	[I915_ENGINE_CLASS_VIDEO_ENHANCE] = "3",
+};
+
+static int __client_register_sysfs_busy(struct i915_drm_client *client)
+{
+	struct i915_drm_clients *clients = client->clients;
+	unsigned int i;
+	int ret = 0;
+
+	if (!(clients->i915->caps.scheduler & I915_SCHEDULER_CAP_ENGINE_BUSY_STATS))
+		return 0;
+
+	client->busy_root = kobject_create_and_add("busy", client->root);
+	if (!client->busy_root)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++) {
+		struct i915_engine_busy_attribute *i915_attr =
+			&client->attr.busy[i];
+		struct device_attribute *attr = &i915_attr->attr;
+
+		if (!intel_engine_lookup_user(clients->i915, i, 0))
+			continue;
+
+		i915_attr->client = client;
+		i915_attr->engine_class = i;
+
+		sysfs_attr_init(&attr->attr);
+
+		attr->attr.name = uabi_class_names[i];
+		attr->attr.mode = 0444;
+		attr->show = show_busy;
+
+		ret = sysfs_create_file(client->busy_root, &attr->attr);
+		if (ret)
+			goto out;
+	}
+
+out:
+	if (ret)
+		kobject_put(client->busy_root);
+
+	return ret;
+}
+
+static void __client_unregister_sysfs_busy(struct i915_drm_client *client)
+{
+	kobject_put(fetch_and_zero(&client->busy_root));
+}
+
 static int __client_register_sysfs(struct i915_drm_client *client)
 {
 	const struct {
@@ -90,9 +184,12 @@ static int __client_register_sysfs(struct i915_drm_client *client)
 
 		ret = sysfs_create_file(client->root, &attr->attr);
 		if (ret)
-			break;
+			goto out;
 	}
 
+	ret = __client_register_sysfs_busy(client);
+
+out:
 	if (ret)
 		kobject_put(client->root);
 
@@ -101,6 +198,8 @@ static int __client_register_sysfs(struct i915_drm_client *client)
 
 static void __client_unregister_sysfs(struct i915_drm_client *client)
 {
+	__client_unregister_sysfs_busy(client);
+
 	kobject_put(fetch_and_zero(&client->root));
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 13f92142e474..83660fa9d2d7 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -30,6 +30,14 @@ struct i915_drm_clients {
 	struct kobject *root;
 };
 
+struct i915_drm_client;
+
+struct i915_engine_busy_attribute {
+	struct device_attribute attr;
+	struct i915_drm_client *client;
+	unsigned int engine_class;
+};
+
 struct i915_drm_client_name {
 	struct rcu_head rcu;
 	struct i915_drm_client *client;
@@ -54,9 +62,11 @@ struct i915_drm_client {
 	struct i915_drm_clients *clients;
 
 	struct kobject *root;
+	struct kobject *busy_root;
 	struct {
 		struct device_attribute pid;
 		struct device_attribute name;
+		struct i915_engine_busy_attribute busy[MAX_ENGINE_CLASS + 1];
 	} attr;
 
 	/**
-- 
2.30.2


^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] [PATCH 7/7] drm/i915: Expose per-engine client busyness
@ 2021-05-13 11:00   ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 11:00 UTC (permalink / raw)
  To: Intel-gfx; +Cc: dri-devel

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Expose per-client and per-engine busyness under the previously added sysfs
client root.

The new files are one per-engine instance and located under the 'busy'
directory. Each contains a monotonically increasing nano-second resolution
times each client's jobs were executing on the GPU.

This enables userspace to create a top-like tool for GPU utilization:

==========================================================================
intel-gpu-top -  935/ 935 MHz;    0% RC6; 14.73 Watts;     1097 irqs/s

      IMC reads:     1401 MiB/s
     IMC writes:        4 MiB/s

          ENGINE      BUSY                                 MI_SEMA MI_WAIT
     Render/3D/0   63.73% |███████████████████           |      3%      0%
       Blitter/0    9.53% |██▊                           |      6%      0%
         Video/0   39.32% |███████████▊                  |     16%      0%
         Video/1   15.62% |████▋                         |      0%      0%
  VideoEnhance/0    0.00% |                              |      0%      0%

  PID            NAME     RCS          BCS          VCS         VECS
 4084        gem_wsim |█████▌     ||█          ||           ||           |
 4086        gem_wsim |█▌         ||           ||███        ||           |
==========================================================================

v2: Use intel_context_engine_get_busy_time.
v3: New directory structure.
v4: Rebase.
v5: sysfs_attr_init.
v6: Small tidy in i915_gem_add_client.
v7: Rebase to be engine class based.
v8:
 * Always enable stats.
 * Walk all client contexts.
v9:
 * Skip unsupported engine classes. (Chris)
 * Use scheduler caps. (Chris)
v10:
 * Use pphwsp runtime only.

Link: https://patchwork.freedesktop.org/series/71182/ # intel_gpu_top
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Aravind Iddamsetty <aravind.iddamsetty@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20210123153733.18139-8-chris@chris-wilson.co.uk
Link: https://patchwork.freedesktop.org/patch/msgid/20210124153136.19124-8-chris@chris-wilson.co.uk
---
 drivers/gpu/drm/i915/i915_drm_client.c | 101 ++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drm_client.h |  10 +++
 2 files changed, 110 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drm_client.c b/drivers/gpu/drm/i915/i915_drm_client.c
index 0ca81a750895..1f8b08a413d4 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.c
+++ b/drivers/gpu/drm/i915/i915_drm_client.c
@@ -9,6 +9,11 @@
 
 #include <drm/drm_print.h>
 
+#include <uapi/drm/i915_drm.h>
+
+#include "gem/i915_gem_context.h"
+#include "gt/intel_engine_user.h"
+
 #include "i915_drm_client.h"
 #include "i915_drv.h"
 #include "i915_gem.h"
@@ -55,6 +60,95 @@ show_client_pid(struct device *kdev, struct device_attribute *attr, char *buf)
 	return ret;
 }
 
+static u64 busy_add(struct i915_gem_context *ctx, unsigned int class)
+{
+	struct i915_gem_engines_iter it;
+	struct intel_context *ce;
+	u64 total = 0;
+
+	for_each_gem_engine(ce, rcu_dereference(ctx->engines), it) {
+		if (ce->engine->uabi_class != class)
+			continue;
+
+		total += intel_context_get_total_runtime_ns(ce);
+	}
+
+	return total;
+}
+
+static ssize_t
+show_busy(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	struct i915_engine_busy_attribute *i915_attr =
+		container_of(attr, typeof(*i915_attr), attr);
+	unsigned int class = i915_attr->engine_class;
+	const struct i915_drm_client *client = i915_attr->client;
+	const struct list_head *list = &client->ctx_list;
+	u64 total = atomic64_read(&client->past_runtime[class]);
+	struct i915_gem_context *ctx;
+
+	rcu_read_lock();
+	list_for_each_entry_rcu(ctx, list, client_link)
+		total += busy_add(ctx, class);
+	rcu_read_unlock();
+
+	return sysfs_emit(buf, "%llu\n", total);
+}
+
+static const char * const uabi_class_names[] = {
+	[I915_ENGINE_CLASS_RENDER] = "0",
+	[I915_ENGINE_CLASS_COPY] = "1",
+	[I915_ENGINE_CLASS_VIDEO] = "2",
+	[I915_ENGINE_CLASS_VIDEO_ENHANCE] = "3",
+};
+
+static int __client_register_sysfs_busy(struct i915_drm_client *client)
+{
+	struct i915_drm_clients *clients = client->clients;
+	unsigned int i;
+	int ret = 0;
+
+	if (!(clients->i915->caps.scheduler & I915_SCHEDULER_CAP_ENGINE_BUSY_STATS))
+		return 0;
+
+	client->busy_root = kobject_create_and_add("busy", client->root);
+	if (!client->busy_root)
+		return -ENOMEM;
+
+	for (i = 0; i < ARRAY_SIZE(uabi_class_names); i++) {
+		struct i915_engine_busy_attribute *i915_attr =
+			&client->attr.busy[i];
+		struct device_attribute *attr = &i915_attr->attr;
+
+		if (!intel_engine_lookup_user(clients->i915, i, 0))
+			continue;
+
+		i915_attr->client = client;
+		i915_attr->engine_class = i;
+
+		sysfs_attr_init(&attr->attr);
+
+		attr->attr.name = uabi_class_names[i];
+		attr->attr.mode = 0444;
+		attr->show = show_busy;
+
+		ret = sysfs_create_file(client->busy_root, &attr->attr);
+		if (ret)
+			goto out;
+	}
+
+out:
+	if (ret)
+		kobject_put(client->busy_root);
+
+	return ret;
+}
+
+static void __client_unregister_sysfs_busy(struct i915_drm_client *client)
+{
+	kobject_put(fetch_and_zero(&client->busy_root));
+}
+
 static int __client_register_sysfs(struct i915_drm_client *client)
 {
 	const struct {
@@ -90,9 +184,12 @@ static int __client_register_sysfs(struct i915_drm_client *client)
 
 		ret = sysfs_create_file(client->root, &attr->attr);
 		if (ret)
-			break;
+			goto out;
 	}
 
+	ret = __client_register_sysfs_busy(client);
+
+out:
 	if (ret)
 		kobject_put(client->root);
 
@@ -101,6 +198,8 @@ static int __client_register_sysfs(struct i915_drm_client *client)
 
 static void __client_unregister_sysfs(struct i915_drm_client *client)
 {
+	__client_unregister_sysfs_busy(client);
+
 	kobject_put(fetch_and_zero(&client->root));
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drm_client.h b/drivers/gpu/drm/i915/i915_drm_client.h
index 13f92142e474..83660fa9d2d7 100644
--- a/drivers/gpu/drm/i915/i915_drm_client.h
+++ b/drivers/gpu/drm/i915/i915_drm_client.h
@@ -30,6 +30,14 @@ struct i915_drm_clients {
 	struct kobject *root;
 };
 
+struct i915_drm_client;
+
+struct i915_engine_busy_attribute {
+	struct device_attribute attr;
+	struct i915_drm_client *client;
+	unsigned int engine_class;
+};
+
 struct i915_drm_client_name {
 	struct rcu_head rcu;
 	struct i915_drm_client *client;
@@ -54,9 +62,11 @@ struct i915_drm_client {
 	struct i915_drm_clients *clients;
 
 	struct kobject *root;
+	struct kobject *busy_root;
 	struct {
 		struct device_attribute pid;
 		struct device_attribute name;
+		struct i915_engine_busy_attribute busy[MAX_ENGINE_CLASS + 1];
 	} attr;
 
 	/**
-- 
2.30.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Per client engine busyness
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (7 preceding siblings ...)
  (?)
@ 2021-05-13 11:28 ` Patchwork
  -1 siblings, 0 replies; 103+ messages in thread
From: Patchwork @ 2021-05-13 11:28 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per client engine busyness
URL   : https://patchwork.freedesktop.org/series/90128/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
289c10899f79 drm/i915: Expose list of clients in sysfs
-:89: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#89: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 402 lines checked
d83c2db8842f drm/i915: Update client name on context create
71fdfda22feb drm/i915: Make GEM contexts track DRM clients
c8194e95eb88 drm/i915: Track runtime spent in closed and unreachable GEM contexts
fc8d2f8f24bf drm/i915: Track all user contexts per client
9ac58b591817 drm/i915: Track context current active time
-:138: WARNING:LINE_SPACING: Missing a blank line after declarations
#138: FILE: drivers/gpu/drm/i915/gt/intel_context_types.h:125:
+			u32 last;
+			I915_SELFTEST_DECLARE(u32 num_underflow);

total: 0 errors, 1 warnings, 0 checks, 296 lines checked
b83b94d7ebd7 drm/i915: Expose per-engine client busyness
-:25: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#25: 
     Render/3D/0   63.73% |███████████████████           |      3%      0%

total: 0 errors, 1 warnings, 0 checks, 152 lines checked


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for Per client engine busyness
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (8 preceding siblings ...)
  (?)
@ 2021-05-13 11:30 ` Patchwork
  -1 siblings, 0 replies; 103+ messages in thread
From: Patchwork @ 2021-05-13 11:30 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Per client engine busyness
URL   : https://patchwork.freedesktop.org/series/90128/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/display/intel_display.c:1887:21:    expected struct i915_vma *[assigned] vma
+drivers/gpu/drm/i915/display/intel_display.c:1887:21:    got void [noderef] __iomem *[assigned] iomem
+drivers/gpu/drm/i915/display/intel_display.c:1887:21: warning: incorrect type in assignment (different address spaces)
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:27:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:32:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:49:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_engine_stats.h:56:9: warning: trying to copy expression type 31
+drivers/gpu/drm/i915/gt/intel_reset.c:1329:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/intel_ring_submission.c:1203:24: warning: Using plain integer as NULL pointer
+drivers/gpu/drm/i915/i915_perf.c:1434:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1488:15: warning: memset with byte count of 16777216
+./include/asm-generic/bitops/find.h:112:45: warning: shift count is negative (-262080)
+./include/asm-generic/bitops/find.h:32:31: warning: shift count is negative (-262080)
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write8' - different lock contexts for basic block


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for Per client engine busyness
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (9 preceding siblings ...)
  (?)
@ 2021-05-13 11:59 ` Patchwork
  -1 siblings, 0 replies; 103+ messages in thread
From: Patchwork @ 2021-05-13 11:59 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 14339 bytes --]

== Series Details ==

Series: Per client engine busyness
URL   : https://patchwork.freedesktop.org/series/90128/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_10074 -> Patchwork_20118
====================================================

Summary
-------

  **WARNING**

  Minor unknown changes coming with Patchwork_20118 need to be verified
  manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_20118, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_20118:

### IGT changes ###

#### Warnings ####

  * igt@i915_selftest@live@execlists:
    - fi-bsw-kefka:       [INCOMPLETE][1] ([i915#2782] / [i915#2940]) -> [DMESG-FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-kefka/igt@i915_selftest@live@execlists.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-kefka/igt@i915_selftest@live@execlists.html

  
Known issues
------------

  Here are the changes found in Patchwork_20118 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_fence@basic-await@bcs0:
    - fi-bsw-n3050:       [PASS][3] -> [FAIL][4] ([i915#3457])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-n3050/igt@gem_exec_fence@basic-await@bcs0.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-n3050/igt@gem_exec_fence@basic-await@bcs0.html

  * igt@gem_exec_fence@basic-await@rcs0:
    - fi-bsw-kefka:       [PASS][5] -> [FAIL][6] ([i915#3457])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-kefka/igt@gem_exec_fence@basic-await@rcs0.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-kefka/igt@gem_exec_fence@basic-await@rcs0.html

  * igt@gem_exec_fence@basic-busy@bcs0:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][7] ([fdo#109271]) +6 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@gem_exec_fence@basic-busy@bcs0.html

  * igt@gem_exec_suspend@basic-s3:
    - fi-tgl-u2:          [PASS][8] -> [FAIL][9] ([i915#1888])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-tgl-u2/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_huc_copy@huc-copy:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][10] ([fdo#109271] / [i915#2190])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@gem_huc_copy@huc-copy.html

  * igt@gem_wait@busy@all:
    - fi-bsw-nick:        [PASS][11] -> [FAIL][12] ([i915#3177] / [i915#3457])
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-nick/igt@gem_wait@busy@all.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-nick/igt@gem_wait@busy@all.html

  * igt@gem_wait@wait@all:
    - fi-bwr-2160:        [PASS][13] -> [FAIL][14] ([i915#3457]) +1 similar issue
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bwr-2160/igt@gem_wait@wait@all.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bwr-2160/igt@gem_wait@wait@all.html
    - fi-bsw-nick:        [PASS][15] -> [FAIL][16] ([i915#3457]) +1 similar issue
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-nick/igt@gem_wait@wait@all.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-nick/igt@gem_wait@wait@all.html

  * igt@i915_module_load@reload:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-WARN][17] ([i915#1982] / [i915#3457])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@i915_module_load@reload.html

  * igt@i915_selftest@live@execlists:
    - fi-kbl-soraka:      NOTRUN -> [INCOMPLETE][18] ([i915#2782] / [i915#3462] / [i915#794])
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@i915_selftest@live@execlists.html

  * igt@i915_selftest@live@gt_pm:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-FAIL][19] ([i915#1886] / [i915#2291])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@i915_selftest@live@gt_pm.html

  * igt@i915_selftest@live@mman:
    - fi-kbl-soraka:      NOTRUN -> [DMESG-WARN][20] ([i915#3457])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@i915_selftest@live@mman.html

  * igt@kms_busy@basic@modeset:
    - fi-ilk-650:         [PASS][21] -> [INCOMPLETE][22] ([i915#3457])
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-ilk-650/igt@kms_busy@basic@modeset.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-ilk-650/igt@kms_busy@basic@modeset.html

  * igt@kms_chamelium@common-hpd-after-suspend:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][23] ([fdo#109271] / [fdo#111827]) +8 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@kms_chamelium@common-hpd-after-suspend.html

  * igt@kms_frontbuffer_tracking@basic:
    - fi-tgl-u2:          [PASS][24] -> [FAIL][25] ([i915#2416])
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-tgl-u2/igt@kms_frontbuffer_tracking@basic.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-tgl-u2/igt@kms_frontbuffer_tracking@basic.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d:
    - fi-kbl-soraka:      NOTRUN -> [SKIP][26] ([fdo#109271] / [i915#533])
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-d.html

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-a:
    - fi-bwr-2160:        [PASS][27] -> [FAIL][28] ([i915#53]) +1 similar issue
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bwr-2160/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bwr-2160/igt@kms_pipe_crc_basic@hang-read-crc-pipe-a.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-a:
    - fi-elk-e7500:       [PASS][29] -> [FAIL][30] ([i915#53]) +2 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-elk-e7500/igt@kms_pipe_crc_basic@read-crc-pipe-a.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-elk-e7500/igt@kms_pipe_crc_basic@read-crc-pipe-a.html

  * igt@runner@aborted:
    - fi-kbl-soraka:      NOTRUN -> [FAIL][31] ([i915#1436] / [i915#3363])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-soraka/igt@runner@aborted.html

  
#### Possible fixes ####

  * igt@gem_busy@busy@all:
    - fi-elk-e7500:       [FAIL][32] ([i915#3457]) -> [PASS][33]
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-elk-e7500/igt@gem_busy@busy@all.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-elk-e7500/igt@gem_busy@busy@all.html

  * igt@gem_exec_fence@basic-await@vcs0:
    - fi-bsw-kefka:       [FAIL][34] ([i915#3457]) -> [PASS][35]
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-kefka/igt@gem_exec_fence@basic-await@vcs0.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-kefka/igt@gem_exec_fence@basic-await@vcs0.html

  * igt@gem_exec_fence@nb-await@vcs0:
    - fi-bsw-nick:        [FAIL][36] ([i915#3457]) -> [PASS][37]
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-nick/igt@gem_exec_fence@nb-await@vcs0.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-nick/igt@gem_exec_fence@nb-await@vcs0.html

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-ilk-650:         [FAIL][38] ([i915#53]) -> [PASS][39] +8 similar issues
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-ilk-650/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-ilk-650/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html
    - fi-elk-e7500:       [FAIL][40] ([i915#53]) -> [PASS][41] +1 similar issue
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-elk-e7500/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-elk-e7500/igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - fi-bwr-2160:        [FAIL][42] ([i915#53]) -> [PASS][43]
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bwr-2160/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bwr-2160/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a.html

  
#### Warnings ####

  * igt@gem_exec_gttfill@basic:
    - fi-pnv-d510:        [FAIL][44] ([i915#3457]) -> [FAIL][45] ([i915#3457] / [i915#3472])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-pnv-d510/igt@gem_exec_gttfill@basic.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-pnv-d510/igt@gem_exec_gttfill@basic.html
    - fi-ilk-650:         [FAIL][46] ([i915#3457]) -> [FAIL][47] ([i915#3457] / [i915#3472])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-ilk-650/igt@gem_exec_gttfill@basic.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-ilk-650/igt@gem_exec_gttfill@basic.html

  * igt@i915_module_load@reload:
    - fi-elk-e7500:       [DMESG-WARN][48] ([i915#3457]) -> [DMESG-FAIL][49] ([i915#3457])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-elk-e7500/igt@i915_module_load@reload.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-elk-e7500/igt@i915_module_load@reload.html
    - fi-bsw-nick:        [DMESG-FAIL][50] ([i915#3457]) -> [DMESG-WARN][51] ([i915#3457])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-bsw-nick/igt@i915_module_load@reload.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-bsw-nick/igt@i915_module_load@reload.html

  * igt@i915_selftest@live@execlists:
    - fi-cml-s:           [DMESG-FAIL][52] ([i915#3462]) -> [INCOMPLETE][53] ([i915#3462])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-cml-s/igt@i915_selftest@live@execlists.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-cml-s/igt@i915_selftest@live@execlists.html

  * igt@runner@aborted:
    - fi-skl-6600u:       [FAIL][54] ([i915#1436] / [i915#2426] / [i915#3363]) -> [FAIL][55] ([i915#1436] / [i915#3363])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-skl-6600u/igt@runner@aborted.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-skl-6600u/igt@runner@aborted.html
    - fi-kbl-guc:         [FAIL][56] ([i915#1436] / [i915#2426] / [i915#3363]) -> [FAIL][57] ([i915#1436] / [i915#3363])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/fi-kbl-guc/igt@runner@aborted.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/fi-kbl-guc/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#111827]: https://bugs.freedesktop.org/show_bug.cgi?id=111827
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#1886]: https://gitlab.freedesktop.org/drm/intel/issues/1886
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2291]: https://gitlab.freedesktop.org/drm/intel/issues/2291
  [i915#2416]: https://gitlab.freedesktop.org/drm/intel/issues/2416
  [i915#2426]: https://gitlab.freedesktop.org/drm/intel/issues/2426
  [i915#2782]: https://gitlab.freedesktop.org/drm/intel/issues/2782
  [i915#2932]: https://gitlab.freedesktop.org/drm/intel/issues/2932
  [i915#2940]: https://gitlab.freedesktop.org/drm/intel/issues/2940
  [i915#2966]: https://gitlab.freedesktop.org/drm/intel/issues/2966
  [i915#3177]: https://gitlab.freedesktop.org/drm/intel/issues/3177
  [i915#3303]: https://gitlab.freedesktop.org/drm/intel/issues/3303
  [i915#3363]: https://gitlab.freedesktop.org/drm/intel/issues/3363
  [i915#3457]: https://gitlab.freedesktop.org/drm/intel/issues/3457
  [i915#3462]: https://gitlab.freedesktop.org/drm/intel/issues/3462
  [i915#3472]: https://gitlab.freedesktop.org/drm/intel/issues/3472
  [i915#53]: https://gitlab.freedesktop.org/drm/intel/issues/53
  [i915#533]: https://gitlab.freedesktop.org/drm/intel/issues/533
  [i915#794]: https://gitlab.freedesktop.org/drm/intel/issues/794


Participating hosts (43 -> 27)
------------------------------

  Additional (1): fi-kbl-soraka 
  Missing    (17): fi-kbl-7567u fi-cml-u2 fi-ilk-m540 fi-hsw-4200u fi-glk-dsi fi-icl-u2 fi-cfl-8700k fi-tgl-1115g4 fi-kbl-7500u fi-bsw-cyan fi-hsw-4770 fi-cfl-guc fi-dg1-1 fi-kbl-x1275 fi-cfl-8109u fi-bdw-samus fi-kbl-r 


Build changes
-------------

  * Linux: CI_DRM_10074 -> Patchwork_20118

  CI-20190529: 20190529
  CI_DRM_10074: 5aefdc1f23734b6a3d545c8497b098ba4d704a0c @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_6083: d28aee5c5f528aa6c352c3339f20aaed4d698ffa @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_20118: b83b94d7ebd7d552e693fa26ecd476ca951a552a @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

b83b94d7ebd7 drm/i915: Expose per-engine client busyness
9ac58b591817 drm/i915: Track context current active time
fc8d2f8f24bf drm/i915: Track all user contexts per client
c8194e95eb88 drm/i915: Track runtime spent in closed and unreachable GEM contexts
71fdfda22feb drm/i915: Make GEM contexts track DRM clients
d83c2db8842f drm/i915: Update client name on context create
289c10899f79 drm/i915: Expose list of clients in sysfs

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/index.html

[-- Attachment #1.2: Type: text/html, Size: 18102 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-13 15:48   ` Alex Deucher
  -1 siblings, 0 replies; 103+ messages in thread
From: Alex Deucher @ 2021-05-13 15:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel Graphics Development, Maling list - DRI developers

On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Resurrect of the previosuly merged per client engine busyness patches. In a
> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> only physical GPU engine usage but per process view as well.
>
> Example screen capture:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
>
>       IMC reads:     4414 MiB/s
>      IMC writes:     3805 MiB/s
>
>           ENGINE      BUSY                                      MI_SEMA MI_WAIT
>      Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
>        Blitter/0    0.00% |                                   |      0%      0%
>          Video/0    0.00% |                                   |      0%      0%
>   VideoEnhance/0    0.00% |                                   |      0%      0%
>
>   PID            NAME  Render/3D      Blitter        Video      VideoEnhance
>  2733       neverball |██████▌     ||            ||            ||            |
>  2047            Xorg |███▊        ||            ||            ||            |
>  2737        glxgears |█▍          ||            ||            ||            |
>  2128           xfwm4 |            ||            ||            ||            |
>  2047            Xorg |            ||            ||            ||            |
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Internally we track time spent on engines for each struct intel_context, both
> for current and past contexts belonging to each open DRM file.
>
> This can serve as a building block for several features from the wanted list:
> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> dynamic SSEU tuning, ...
>
> To enable userspace access to the tracked data, we expose time spent on GPU per
> client and per engine class in sysfs with a hierarchy like the below:
>
>         # cd /sys/class/drm/card0/clients/
>         # tree
>         .
>         ├── 7
>         │   ├── busy
>         │   │   ├── 0
>         │   │   ├── 1
>         │   │   ├── 2
>         │   │   └── 3
>         │   ├── name
>         │   └── pid
>         ├── 8
>         │   ├── busy
>         │   │   ├── 0
>         │   │   ├── 1
>         │   │   ├── 2
>         │   │   └── 3
>         │   ├── name
>         │   └── pid
>         └── 9
>             ├── busy
>             │   ├── 0
>             │   ├── 1
>             │   ├── 2
>             │   └── 3
>             ├── name
>             └── pid
>
> Files in 'busy' directories are numbered using the engine class ABI values and
> they contain accumulated nanoseconds each client spent on engines of a
> respective class.

We did something similar in amdgpu using the gpu scheduler.  We then
expose the data via fdinfo.  See
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Alex


>
> Tvrtko Ursulin (7):
>   drm/i915: Expose list of clients in sysfs
>   drm/i915: Update client name on context create
>   drm/i915: Make GEM contexts track DRM clients
>   drm/i915: Track runtime spent in closed and unreachable GEM contexts
>   drm/i915: Track all user contexts per client
>   drm/i915: Track context current active time
>   drm/i915: Expose per-engine client busyness
>
>  drivers/gpu/drm/i915/Makefile                 |   5 +-
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
>  drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
>  drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
>  drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
>  .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
>  .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
>  drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
>  drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
>  drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
>  drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
>  drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>  drivers/gpu/drm/i915/i915_drv.c               |   6 +
>  drivers/gpu/drm/i915/i915_drv.h               |   5 +
>  drivers/gpu/drm/i915/i915_gem.c               |  21 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
>  drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
>  drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
>  19 files changed, 716 insertions(+), 81 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-13 15:48   ` Alex Deucher
  0 siblings, 0 replies; 103+ messages in thread
From: Alex Deucher @ 2021-05-13 15:48 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel Graphics Development, Maling list - DRI developers

On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Resurrect of the previosuly merged per client engine busyness patches. In a
> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> only physical GPU engine usage but per process view as well.
>
> Example screen capture:
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
>
>       IMC reads:     4414 MiB/s
>      IMC writes:     3805 MiB/s
>
>           ENGINE      BUSY                                      MI_SEMA MI_WAIT
>      Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
>        Blitter/0    0.00% |                                   |      0%      0%
>          Video/0    0.00% |                                   |      0%      0%
>   VideoEnhance/0    0.00% |                                   |      0%      0%
>
>   PID            NAME  Render/3D      Blitter        Video      VideoEnhance
>  2733       neverball |██████▌     ||            ||            ||            |
>  2047            Xorg |███▊        ||            ||            ||            |
>  2737        glxgears |█▍          ||            ||            ||            |
>  2128           xfwm4 |            ||            ||            ||            |
>  2047            Xorg |            ||            ||            ||            |
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>
> Internally we track time spent on engines for each struct intel_context, both
> for current and past contexts belonging to each open DRM file.
>
> This can serve as a building block for several features from the wanted list:
> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> dynamic SSEU tuning, ...
>
> To enable userspace access to the tracked data, we expose time spent on GPU per
> client and per engine class in sysfs with a hierarchy like the below:
>
>         # cd /sys/class/drm/card0/clients/
>         # tree
>         .
>         ├── 7
>         │   ├── busy
>         │   │   ├── 0
>         │   │   ├── 1
>         │   │   ├── 2
>         │   │   └── 3
>         │   ├── name
>         │   └── pid
>         ├── 8
>         │   ├── busy
>         │   │   ├── 0
>         │   │   ├── 1
>         │   │   ├── 2
>         │   │   └── 3
>         │   ├── name
>         │   └── pid
>         └── 9
>             ├── busy
>             │   ├── 0
>             │   ├── 1
>             │   ├── 2
>             │   └── 3
>             ├── name
>             └── pid
>
> Files in 'busy' directories are numbered using the engine class ABI values and
> they contain accumulated nanoseconds each client spent on engines of a
> respective class.

We did something similar in amdgpu using the gpu scheduler.  We then
expose the data via fdinfo.  See
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Alex


>
> Tvrtko Ursulin (7):
>   drm/i915: Expose list of clients in sysfs
>   drm/i915: Update client name on context create
>   drm/i915: Make GEM contexts track DRM clients
>   drm/i915: Track runtime spent in closed and unreachable GEM contexts
>   drm/i915: Track all user contexts per client
>   drm/i915: Track context current active time
>   drm/i915: Expose per-engine client busyness
>
>  drivers/gpu/drm/i915/Makefile                 |   5 +-
>  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
>  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
>  drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
>  drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
>  drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
>  .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
>  .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
>  drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
>  drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
>  drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
>  drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
>  drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>  drivers/gpu/drm/i915/i915_drv.c               |   6 +
>  drivers/gpu/drm/i915/i915_drv.h               |   5 +
>  drivers/gpu/drm/i915/i915_gem.c               |  21 +-
>  drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
>  drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
>  drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
>  19 files changed, 716 insertions(+), 81 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>
> --
> 2.30.2
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for Per client engine busyness
  2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
                   ` (11 preceding siblings ...)
  (?)
@ 2021-05-13 16:38 ` Patchwork
  -1 siblings, 0 replies; 103+ messages in thread
From: Patchwork @ 2021-05-13 16:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 30249 bytes --]

== Series Details ==

Series: Per client engine busyness
URL   : https://patchwork.freedesktop.org/series/90128/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_10074_full -> Patchwork_20118_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_20118_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_20118_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_20118_full:

### IGT changes ###

#### Possible regressions ####

  * igt@kms_flip_tiling@flip-changes-tiling@hdmi-a-1-pipe-c:
    - shard-glk:          [PASS][1] -> [FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk8/igt@kms_flip_tiling@flip-changes-tiling@hdmi-a-1-pipe-c.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk8/igt@kms_flip_tiling@flip-changes-tiling@hdmi-a-1-pipe-c.html

  * igt@kms_plane_cursor@pipe-b-overlay-size-128:
    - shard-snb:          NOTRUN -> [FAIL][3]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb7/igt@kms_plane_cursor@pipe-b-overlay-size-128.html

  
#### Warnings ####

  * igt@kms_plane_cursor@pipe-c-viewport-size-64:
    - shard-tglb:         [FAIL][4] ([i915#3457]) -> [FAIL][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-tglb8/igt@kms_plane_cursor@pipe-c-viewport-size-64.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb8/igt@kms_plane_cursor@pipe-c-viewport-size-64.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@kms_plane@plane-position-covered@pipe-b-planes}:
    - shard-glk:          [FAIL][6] ([i915#3457]) -> [FAIL][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk8/igt@kms_plane@plane-position-covered@pipe-b-planes.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk5/igt@kms_plane@plane-position-covered@pipe-b-planes.html

  
Known issues
------------

  Here are the changes found in Patchwork_20118_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@api_intel_bb@intel-bb-blit-x:
    - shard-glk:          [PASS][8] -> [FAIL][9] ([i915#3471])
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk3/igt@api_intel_bb@intel-bb-blit-x.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk6/igt@api_intel_bb@intel-bb-blit-x.html

  * igt@api_intel_bb@offset-control:
    - shard-skl:          NOTRUN -> [DMESG-WARN][10] ([i915#3457])
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl4/igt@api_intel_bb@offset-control.html

  * igt@gem_ctx_persistence@legacy-engines-queued:
    - shard-snb:          NOTRUN -> [SKIP][11] ([fdo#109271] / [i915#1099]) +2 similar issues
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb2/igt@gem_ctx_persistence@legacy-engines-queued.html

  * igt@gem_exec_fair@basic-none-share@rcs0:
    - shard-iclb:         [PASS][12] -> [FAIL][13] ([i915#2842] / [i915#3457])
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-iclb4/igt@gem_exec_fair@basic-none-share@rcs0.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb4/igt@gem_exec_fair@basic-none-share@rcs0.html

  * igt@gem_exec_fair@basic-none@bcs0:
    - shard-glk:          NOTRUN -> [SKIP][14] ([fdo#109271] / [i915#3457])
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk8/igt@gem_exec_fair@basic-none@bcs0.html

  * igt@gem_exec_fair@basic-none@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][15] ([i915#2842] / [i915#3457])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb2/igt@gem_exec_fair@basic-none@vcs1.html

  * igt@gem_exec_fair@basic-pace@vcs0:
    - shard-glk:          [PASS][16] -> [FAIL][17] ([i915#3209] / [i915#3457])
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk3/igt@gem_exec_fair@basic-pace@vcs0.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk6/igt@gem_exec_fair@basic-pace@vcs0.html

  * igt@gem_exec_fair@basic-pace@vecs0:
    - shard-kbl:          [PASS][18] -> [FAIL][19] ([i915#2842] / [i915#3457]) +2 similar issues
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-kbl7/igt@gem_exec_fair@basic-pace@vecs0.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl6/igt@gem_exec_fair@basic-pace@vecs0.html
    - shard-apl:          [PASS][20] -> [INCOMPLETE][21] ([i915#3457])
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-apl6/igt@gem_exec_fair@basic-pace@vecs0.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl1/igt@gem_exec_fair@basic-pace@vecs0.html

  * igt@gem_exec_reloc@basic-wide-active@rcs0:
    - shard-kbl:          NOTRUN -> [FAIL][22] ([i915#2389] / [i915#3457]) +4 similar issues
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl7/igt@gem_exec_reloc@basic-wide-active@rcs0.html

  * igt@gem_exec_reloc@basic-wide-active@vcs1:
    - shard-iclb:         NOTRUN -> [FAIL][23] ([i915#2389] / [i915#3457])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb2/igt@gem_exec_reloc@basic-wide-active@vcs1.html

  * igt@gem_exec_schedule@u-fairslice@vecs0:
    - shard-glk:          NOTRUN -> [FAIL][24] ([i915#3457]) +8 similar issues
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk7/igt@gem_exec_schedule@u-fairslice@vecs0.html

  * igt@gem_exec_whisper@basic-normal-all:
    - shard-glk:          [PASS][25] -> [DMESG-WARN][26] ([i915#118] / [i915#95])
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk6/igt@gem_exec_whisper@basic-normal-all.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk2/igt@gem_exec_whisper@basic-normal-all.html

  * igt@gem_mmap_gtt@cpuset-basic-small-copy:
    - shard-apl:          [PASS][27] -> [INCOMPLETE][28] ([i915#3468])
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-apl3/igt@gem_mmap_gtt@cpuset-basic-small-copy.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl6/igt@gem_mmap_gtt@cpuset-basic-small-copy.html

  * igt@gem_mmap_gtt@cpuset-medium-copy:
    - shard-iclb:         [PASS][29] -> [FAIL][30] ([i915#307])
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-iclb7/igt@gem_mmap_gtt@cpuset-medium-copy.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb7/igt@gem_mmap_gtt@cpuset-medium-copy.html

  * igt@gem_mmap_gtt@cpuset-medium-copy-xy:
    - shard-iclb:         [PASS][31] -> [INCOMPLETE][32] ([i915#3468])
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-iclb6/igt@gem_mmap_gtt@cpuset-medium-copy-xy.html
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb8/igt@gem_mmap_gtt@cpuset-medium-copy-xy.html
    - shard-glk:          [PASS][33] -> [FAIL][34] ([i915#307]) +1 similar issue
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk2/igt@gem_mmap_gtt@cpuset-medium-copy-xy.html
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk8/igt@gem_mmap_gtt@cpuset-medium-copy-xy.html

  * igt@gem_mmap_gtt@fault-concurrent-y:
    - shard-skl:          NOTRUN -> [INCOMPLETE][35] ([i915#3468])
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl9/igt@gem_mmap_gtt@fault-concurrent-y.html

  * igt@gem_render_copy@yf-tiled-ccs-to-linear:
    - shard-skl:          NOTRUN -> [INCOMPLETE][36] ([i915#198] / [i915#3468])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl1/igt@gem_render_copy@yf-tiled-ccs-to-linear.html

  * igt@gem_render_copy@yf-tiled-ccs-to-x-tiled:
    - shard-apl:          NOTRUN -> [INCOMPLETE][37] ([i915#3468]) +3 similar issues
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl8/igt@gem_render_copy@yf-tiled-ccs-to-x-tiled.html

  * igt@gem_userptr_blits@vma-merge:
    - shard-snb:          NOTRUN -> [FAIL][38] ([i915#2724] / [i915#3457])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb7/igt@gem_userptr_blits@vma-merge.html

  * igt@i915_module_load@reload:
    - shard-kbl:          NOTRUN -> [DMESG-WARN][39] ([i915#3457]) +1 similar issue
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl2/igt@i915_module_load@reload.html

  * igt@i915_pm_dc@dc6-psr:
    - shard-iclb:         [PASS][40] -> [FAIL][41] ([i915#454])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-iclb5/igt@i915_pm_dc@dc6-psr.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb2/igt@i915_pm_dc@dc6-psr.html

  * igt@i915_pm_rpm@cursor-dpms:
    - shard-apl:          [PASS][42] -> [DMESG-WARN][43] ([i915#3457])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-apl6/igt@i915_pm_rpm@cursor-dpms.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl2/igt@i915_pm_rpm@cursor-dpms.html

  * igt@i915_pm_rpm@gem-mmap-type@uc:
    - shard-kbl:          [PASS][44] -> [DMESG-WARN][45] ([i915#3475])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-kbl2/igt@i915_pm_rpm@gem-mmap-type@uc.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl4/igt@i915_pm_rpm@gem-mmap-type@uc.html

  * igt@i915_pm_rpm@gem-mmap-type@wc:
    - shard-iclb:         [PASS][46] -> [DMESG-WARN][47] ([i915#3457])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-iclb4/igt@i915_pm_rpm@gem-mmap-type@wc.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb5/igt@i915_pm_rpm@gem-mmap-type@wc.html
    - shard-kbl:          [PASS][48] -> [DMESG-WARN][49] ([i915#3457])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-kbl2/igt@i915_pm_rpm@gem-mmap-type@wc.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl4/igt@i915_pm_rpm@gem-mmap-type@wc.html

  * igt@i915_pm_rps@min-max-config-loaded:
    - shard-apl:          NOTRUN -> [DMESG-WARN][50] ([i915#3457])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl7/igt@i915_pm_rps@min-max-config-loaded.html

  * igt@i915_pm_rps@reset:
    - shard-tglb:         NOTRUN -> [DMESG-WARN][51] ([i915#3457])
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@i915_pm_rps@reset.html
    - shard-iclb:         NOTRUN -> [DMESG-WARN][52] ([i915#3457])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@i915_pm_rps@reset.html

  * igt@i915_selftest@live@hangcheck:
    - shard-snb:          NOTRUN -> [INCOMPLETE][53] ([i915#2782])
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb2/igt@i915_selftest@live@hangcheck.html

  * igt@i915_selftest@live@mman:
    - shard-snb:          NOTRUN -> [DMESG-WARN][54] ([i915#3457])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb2/igt@i915_selftest@live@mman.html

  * igt@kms_async_flips@alternate-sync-async-flip:
    - shard-skl:          [PASS][55] -> [FAIL][56] ([i915#2521])
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-skl7/igt@kms_async_flips@alternate-sync-async-flip.html
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl9/igt@kms_async_flips@alternate-sync-async-flip.html

  * igt@kms_big_fb@yf-tiled-addfb:
    - shard-tglb:         NOTRUN -> [SKIP][57] ([fdo#111615])
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_big_fb@yf-tiled-addfb.html

  * igt@kms_ccs@pipe-d-missing-ccs-buffer:
    - shard-iclb:         NOTRUN -> [SKIP][58] ([fdo#109278])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_ccs@pipe-d-missing-ccs-buffer.html

  * igt@kms_chamelium@dp-frame-dump:
    - shard-iclb:         NOTRUN -> [SKIP][59] ([fdo#109284] / [fdo#111827]) +1 similar issue
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_chamelium@dp-frame-dump.html
    - shard-tglb:         NOTRUN -> [SKIP][60] ([fdo#109284] / [fdo#111827]) +1 similar issue
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_chamelium@dp-frame-dump.html

  * igt@kms_chamelium@hdmi-crc-multiple:
    - shard-kbl:          NOTRUN -> [SKIP][61] ([fdo#109271] / [fdo#111827]) +9 similar issues
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl2/igt@kms_chamelium@hdmi-crc-multiple.html

  * igt@kms_chamelium@hdmi-mode-timings:
    - shard-snb:          NOTRUN -> [SKIP][62] ([fdo#109271] / [fdo#111827]) +19 similar issues
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb2/igt@kms_chamelium@hdmi-mode-timings.html

  * igt@kms_chamelium@vga-hpd:
    - shard-apl:          NOTRUN -> [SKIP][63] ([fdo#109271] / [fdo#111827]) +7 similar issues
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl8/igt@kms_chamelium@vga-hpd.html

  * igt@kms_color@pipe-a-ctm-0-75:
    - shard-skl:          [PASS][64] -> [DMESG-WARN][65] ([i915#1982])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-skl9/igt@kms_color@pipe-a-ctm-0-75.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl7/igt@kms_color@pipe-a-ctm-0-75.html

  * igt@kms_color@pipe-b-degamma:
    - shard-tglb:         NOTRUN -> [FAIL][66] ([i915#1149])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_color@pipe-b-degamma.html
    - shard-iclb:         NOTRUN -> [FAIL][67] ([i915#1149])
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_color@pipe-b-degamma.html

  * igt@kms_color_chamelium@pipe-d-ctm-green-to-red:
    - shard-skl:          NOTRUN -> [SKIP][68] ([fdo#109271] / [fdo#111827]) +4 similar issues
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl4/igt@kms_color_chamelium@pipe-d-ctm-green-to-red.html

  * igt@kms_content_protection@atomic:
    - shard-kbl:          NOTRUN -> [TIMEOUT][69] ([i915#1319]) +2 similar issues
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl2/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@dp-mst-lic-type-0:
    - shard-iclb:         NOTRUN -> [SKIP][70] ([i915#3116])
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_content_protection@dp-mst-lic-type-0.html
    - shard-tglb:         NOTRUN -> [SKIP][71] ([i915#3116])
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_content_protection@dp-mst-lic-type-0.html

  * igt@kms_cursor_crc@pipe-a-cursor-128x42-onscreen:
    - shard-skl:          NOTRUN -> [FAIL][72] ([i915#3444] / [i915#3457]) +5 similar issues
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl4/igt@kms_cursor_crc@pipe-a-cursor-128x42-onscreen.html

  * igt@kms_cursor_crc@pipe-a-cursor-64x64-offscreen:
    - shard-kbl:          NOTRUN -> [FAIL][73] ([i915#3444] / [i915#3457]) +3 similar issues
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl6/igt@kms_cursor_crc@pipe-a-cursor-64x64-offscreen.html

  * igt@kms_cursor_crc@pipe-a-cursor-max-size-random:
    - shard-skl:          NOTRUN -> [SKIP][74] ([fdo#109271] / [i915#3457]) +8 similar issues
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl9/igt@kms_cursor_crc@pipe-a-cursor-max-size-random.html

  * igt@kms_cursor_crc@pipe-a-cursor-size-change:
    - shard-snb:          NOTRUN -> [FAIL][75] ([i915#3457]) +7 similar issues
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb2/igt@kms_cursor_crc@pipe-a-cursor-size-change.html

  * igt@kms_cursor_crc@pipe-b-cursor-128x42-offscreen:
    - shard-tglb:         [PASS][76] -> [FAIL][77] ([i915#2124] / [i915#3457])
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-tglb6/igt@kms_cursor_crc@pipe-b-cursor-128x42-offscreen.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_cursor_crc@pipe-b-cursor-128x42-offscreen.html

  * igt@kms_cursor_crc@pipe-b-cursor-256x85-offscreen:
    - shard-kbl:          [PASS][78] -> [FAIL][79] ([i915#3444] / [i915#3457])
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-kbl2/igt@kms_cursor_crc@pipe-b-cursor-256x85-offscreen.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl2/igt@kms_cursor_crc@pipe-b-cursor-256x85-offscreen.html

  * igt@kms_cursor_crc@pipe-b-cursor-256x85-random:
    - shard-iclb:         NOTRUN -> [FAIL][80] ([i915#3457])
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_cursor_crc@pipe-b-cursor-256x85-random.html
    - shard-tglb:         NOTRUN -> [FAIL][81] ([i915#2124] / [i915#3457]) +1 similar issue
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_cursor_crc@pipe-b-cursor-256x85-random.html

  * igt@kms_cursor_crc@pipe-b-cursor-32x10-sliding:
    - shard-tglb:         NOTRUN -> [SKIP][82] ([i915#3359] / [i915#3457])
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_cursor_crc@pipe-b-cursor-32x10-sliding.html
    - shard-iclb:         NOTRUN -> [SKIP][83] ([fdo#109278] / [i915#3457]) +2 similar issues
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_cursor_crc@pipe-b-cursor-32x10-sliding.html

  * igt@kms_cursor_crc@pipe-c-cursor-128x128-sliding:
    - shard-apl:          NOTRUN -> [FAIL][84] ([i915#3444] / [i915#3457]) +2 similar issues
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl7/igt@kms_cursor_crc@pipe-c-cursor-128x128-sliding.html

  * igt@kms_cursor_crc@pipe-c-cursor-32x32-onscreen:
    - shard-apl:          NOTRUN -> [SKIP][85] ([fdo#109271] / [i915#3457]) +16 similar issues
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl7/igt@kms_cursor_crc@pipe-c-cursor-32x32-onscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-32x32-sliding:
    - shard-tglb:         NOTRUN -> [SKIP][86] ([i915#3319] / [i915#3457])
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_cursor_crc@pipe-c-cursor-32x32-sliding.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x170-offscreen:
    - shard-kbl:          NOTRUN -> [SKIP][87] ([fdo#109271] / [i915#3457]) +13 similar issues
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl3/igt@kms_cursor_crc@pipe-c-cursor-512x170-offscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-512x512-onscreen:
    - shard-snb:          NOTRUN -> [SKIP][88] ([fdo#109271] / [i915#3457]) +37 similar issues
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb7/igt@kms_cursor_crc@pipe-c-cursor-512x512-onscreen.html

  * igt@kms_cursor_crc@pipe-c-cursor-64x64-random:
    - shard-glk:          [PASS][89] -> [FAIL][90] ([i915#3444] / [i915#3457]) +7 similar issues
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk9/igt@kms_cursor_crc@pipe-c-cursor-64x64-random.html
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk8/igt@kms_cursor_crc@pipe-c-cursor-64x64-random.html

  * igt@kms_cursor_edge_walk@pipe-c-256x256-left-edge:
    - shard-glk:          [PASS][91] -> [FAIL][92] ([i915#70]) +2 similar issues
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk2/igt@kms_cursor_edge_walk@pipe-c-256x256-left-edge.html
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk3/igt@kms_cursor_edge_walk@pipe-c-256x256-left-edge.html

  * igt@kms_cursor_edge_walk@pipe-d-128x128-right-edge:
    - shard-skl:          NOTRUN -> [SKIP][93] ([fdo#109271]) +55 similar issues
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl4/igt@kms_cursor_edge_walk@pipe-d-128x128-right-edge.html

  * igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions:
    - shard-iclb:         NOTRUN -> [SKIP][94] ([fdo#109274] / [fdo#109278])
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_cursor_legacy@cursorb-vs-flipb-atomic-transitions.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-glk:          [PASS][95] -> [FAIL][96] ([i915#2346] / [i915#3457] / [i915#533])
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk9/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk6/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-kbl:          NOTRUN -> [INCOMPLETE][97] ([i915#155] / [i915#180] / [i915#636])
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl7/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@2x-absolute-wf_vblank-interruptible:
    - shard-iclb:         NOTRUN -> [SKIP][98] ([fdo#109274]) +2 similar issues
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_flip@2x-absolute-wf_vblank-interruptible.html

  * igt@kms_flip@2x-blocking-wf_vblank@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [PASS][99] -> [FAIL][100] ([i915#2122])
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk4/igt@kms_flip@2x-blocking-wf_vblank@ab-hdmi-a1-hdmi-a2.html
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk4/igt@kms_flip@2x-blocking-wf_vblank@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@2x-flip-vs-absolute-wf_vblank-interruptible:
    - shard-apl:          NOTRUN -> [SKIP][101] ([fdo#109271]) +68 similar issues
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl1/igt@kms_flip@2x-flip-vs-absolute-wf_vblank-interruptible.html

  * igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1:
    - shard-skl:          [PASS][102] -> [FAIL][103] ([i915#2122])
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-skl8/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl4/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt:
    - shard-snb:          NOTRUN -> [SKIP][104] ([fdo#109271]) +272 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb7/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-msflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-plflip-blt:
    - shard-tglb:         NOTRUN -> [SKIP][105] ([fdo#111825]) +10 similar issues
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-shrfb-plflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-move:
    - shard-glk:          [PASS][106] -> [FAIL][107] ([i915#49])
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk3/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-move.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk6/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-move.html

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-pgflip-blt:
    - shard-iclb:         NOTRUN -> [SKIP][108] ([fdo#109280]) +6 similar issues
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-shrfb-pgflip-blt.html

  * igt@kms_hdr@bpc-switch:
    - shard-skl:          NOTRUN -> [FAIL][109] ([i915#1188])
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl9/igt@kms_hdr@bpc-switch.html

  * igt@kms_pipe_b_c_ivb@enable-pipe-c-while-b-has-3-lanes:
    - shard-tglb:         NOTRUN -> [SKIP][110] ([fdo#109289])
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_pipe_b_c_ivb@enable-pipe-c-while-b-has-3-lanes.html
    - shard-iclb:         NOTRUN -> [SKIP][111] ([fdo#109289])
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_pipe_b_c_ivb@enable-pipe-c-while-b-has-3-lanes.html

  * igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c:
    - shard-glk:          [PASS][112] -> [FAIL][113] ([i915#53])
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk5/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk4/igt@kms_pipe_crc_basic@compare-crc-sanitycheck-pipe-c.html

  * igt@kms_pipe_crc_basic@read-crc-pipe-d-frame-sequence:
    - shard-kbl:          NOTRUN -> [SKIP][114] ([fdo#109271] / [i915#533]) +1 similar issue
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl2/igt@kms_pipe_crc_basic@read-crc-pipe-d-frame-sequence.html

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c:
    - shard-kbl:          [PASS][115] -> [DMESG-WARN][116] ([i915#180])
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-kbl7/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl3/igt@kms_pipe_crc_basic@suspend-read-crc-pipe-c.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
    - shard-skl:          NOTRUN -> [FAIL][117] ([fdo#108145] / [i915#265]) +1 similar issue
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max:
    - shard-kbl:          NOTRUN -> [FAIL][118] ([fdo#108145] / [i915#265])
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl2/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-max.html

  * igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb:
    - shard-kbl:          NOTRUN -> [FAIL][119] ([i915#265])
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl6/igt@kms_plane_alpha_blend@pipe-c-alpha-transparent-fb.html

  * igt@kms_plane_cursor@pipe-a-primary-size-128:
    - shard-tglb:         NOTRUN -> [FAIL][120] ([i915#3461])
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-tglb3/igt@kms_plane_cursor@pipe-a-primary-size-128.html
    - shard-iclb:         NOTRUN -> [FAIL][121] ([i915#2657] / [i915#3461])
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb6/igt@kms_plane_cursor@pipe-a-primary-size-128.html

  * igt@kms_plane_cursor@pipe-a-primary-size-64:
    - shard-skl:          NOTRUN -> [FAIL][122] ([i915#2657] / [i915#3457] / [i915#3461])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl9/igt@kms_plane_cursor@pipe-a-primary-size-64.html

  * igt@kms_plane_cursor@pipe-a-viewport-size-128:
    - shard-skl:          NOTRUN -> [FAIL][123] ([i915#2657])
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl4/igt@kms_plane_cursor@pipe-a-viewport-size-128.html

  * igt@kms_plane_cursor@pipe-b-overlay-size-128:
    - shard-kbl:          NOTRUN -> [FAIL][124] ([i915#2657]) +1 similar issue
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-kbl7/igt@kms_plane_cursor@pipe-b-overlay-size-128.html

  * igt@kms_plane_cursor@pipe-b-primary-size-64:
    - shard-skl:          [PASS][125] -> [FAIL][126] ([i915#2657] / [i915#3457])
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-skl7/igt@kms_plane_cursor@pipe-b-primary-size-64.html
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-skl8/igt@kms_plane_cursor@pipe-b-primary-size-64.html

  * igt@kms_plane_cursor@pipe-b-viewport-size-256:
    - shard-snb:          NOTRUN -> [FAIL][127] ([i915#3461])
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-snb2/igt@kms_plane_cursor@pipe-b-viewport-size-256.html

  * igt@kms_plane_cursor@pipe-b-viewport-size-64:
    - shard-glk:          [PASS][128] -> [FAIL][129] ([i915#2657] / [i915#3457])
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-glk3/igt@kms_plane_cursor@pipe-b-viewport-size-64.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-glk6/igt@kms_plane_cursor@pipe-b-viewport-size-64.html
    - shard-apl:          [PASS][130] -> [FAIL][131] ([i915#2657] / [i915#3457])
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-apl6/igt@kms_plane_cursor@pipe-b-viewport-size-64.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl1/igt@kms_plane_cursor@pipe-b-viewport-size-64.html

  * igt@kms_plane_cursor@pipe-c-overlay-size-256:
    - shard-apl:          [PASS][132] -> [FAIL][133] ([i915#2657])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-apl3/igt@kms_plane_cursor@pipe-c-overlay-size-256.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-apl7/igt@kms_plane_cursor@pipe-c-overlay-size-256.html

  * igt@kms_plane_cursor@pipe-c-primary-size-128:
    - shard-iclb:         [PASS][134] -> [FAIL][135] ([i915#2657] / [i915#3461])
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_10074/shard-iclb2/igt@kms_plane_cursor@pipe-c-primary-size-128.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-iclb7/igt@kms_plane_cursor@pipe-c-primary-size-128.html

  * igt@kms_plane_cursor@pipe-c-primary-size-64:
    - shard-iclb:         NOTRUN -> [FAIL][136] ([i915#2657] / [i915#3457])
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/shard-

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_20118/index.html

[-- Attachment #1.2: Type: text/html, Size: 33578 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-13 15:48   ` [Intel-gfx] " Alex Deucher
@ 2021-05-13 16:40     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 16:40 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Intel Graphics Development, Maling list - DRI developers


Hi,

On 13/05/2021 16:48, Alex Deucher wrote:
> On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Resurrect of the previosuly merged per client engine busyness patches. In a
>> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
>> only physical GPU engine usage but per process view as well.
>>
>> Example screen capture:
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
>>
>>        IMC reads:     4414 MiB/s
>>       IMC writes:     3805 MiB/s
>>
>>            ENGINE      BUSY                                      MI_SEMA MI_WAIT
>>       Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
>>         Blitter/0    0.00% |                                   |      0%      0%
>>           Video/0    0.00% |                                   |      0%      0%
>>    VideoEnhance/0    0.00% |                                   |      0%      0%
>>
>>    PID            NAME  Render/3D      Blitter        Video      VideoEnhance
>>   2733       neverball |██████▌     ||            ||            ||            |
>>   2047            Xorg |███▊        ||            ||            ||            |
>>   2737        glxgears |█▍          ||            ||            ||            |
>>   2128           xfwm4 |            ||            ||            ||            |
>>   2047            Xorg |            ||            ||            ||            |
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Internally we track time spent on engines for each struct intel_context, both
>> for current and past contexts belonging to each open DRM file.
>>
>> This can serve as a building block for several features from the wanted list:
>> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
>> wanted by some customers, setrlimit(2) like controls, cgroups controller,
>> dynamic SSEU tuning, ...
>>
>> To enable userspace access to the tracked data, we expose time spent on GPU per
>> client and per engine class in sysfs with a hierarchy like the below:
>>
>>          # cd /sys/class/drm/card0/clients/
>>          # tree
>>          .
>>          ├── 7
>>          │   ├── busy
>>          │   │   ├── 0
>>          │   │   ├── 1
>>          │   │   ├── 2
>>          │   │   └── 3
>>          │   ├── name
>>          │   └── pid
>>          ├── 8
>>          │   ├── busy
>>          │   │   ├── 0
>>          │   │   ├── 1
>>          │   │   ├── 2
>>          │   │   └── 3
>>          │   ├── name
>>          │   └── pid
>>          └── 9
>>              ├── busy
>>              │   ├── 0
>>              │   ├── 1
>>              │   ├── 2
>>              │   └── 3
>>              ├── name
>>              └── pid
>>
>> Files in 'busy' directories are numbered using the engine class ABI values and
>> they contain accumulated nanoseconds each client spent on engines of a
>> respective class.
> 
> We did something similar in amdgpu using the gpu scheduler.  We then
> expose the data via fdinfo.  See
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Interesting!

Is yours wall time or actual GPU time taking preemption and such into 
account? Do you have some userspace tools parsing this data and how to 
do you client discovery? Presumably there has to be a better way that 
going through all open file descriptors?

Our implementation was merged in January but Daniel took it out recently 
because he wanted to have discussion about a common vendor framework for 
this whole story on dri-devel. I think. +Daniel to comment.

I couldn't find the patch you pasted on the mailing list to see if there 
was any such discussion around your version.

Regards,

Tvrtko

> 
> Alex
> 
> 
>>
>> Tvrtko Ursulin (7):
>>    drm/i915: Expose list of clients in sysfs
>>    drm/i915: Update client name on context create
>>    drm/i915: Make GEM contexts track DRM clients
>>    drm/i915: Track runtime spent in closed and unreachable GEM contexts
>>    drm/i915: Track all user contexts per client
>>    drm/i915: Track context current active time
>>    drm/i915: Expose per-engine client busyness
>>
>>   drivers/gpu/drm/i915/Makefile                 |   5 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
>>   .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
>>   drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
>>   drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
>>   drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
>>   .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
>>   .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
>>   drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
>>   drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
>>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
>>   drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>>   drivers/gpu/drm/i915/i915_drv.c               |   6 +
>>   drivers/gpu/drm/i915/i915_drv.h               |   5 +
>>   drivers/gpu/drm/i915/i915_gem.c               |  21 +-
>>   drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
>>   drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
>>   drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
>>   19 files changed, 716 insertions(+), 81 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>>
>> --
>> 2.30.2
>>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-13 16:40     ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-13 16:40 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Intel Graphics Development, Maling list - DRI developers


Hi,

On 13/05/2021 16:48, Alex Deucher wrote:
> On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Resurrect of the previosuly merged per client engine busyness patches. In a
>> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
>> only physical GPU engine usage but per process view as well.
>>
>> Example screen capture:
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
>>
>>        IMC reads:     4414 MiB/s
>>       IMC writes:     3805 MiB/s
>>
>>            ENGINE      BUSY                                      MI_SEMA MI_WAIT
>>       Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
>>         Blitter/0    0.00% |                                   |      0%      0%
>>           Video/0    0.00% |                                   |      0%      0%
>>    VideoEnhance/0    0.00% |                                   |      0%      0%
>>
>>    PID            NAME  Render/3D      Blitter        Video      VideoEnhance
>>   2733       neverball |██████▌     ||            ||            ||            |
>>   2047            Xorg |███▊        ||            ||            ||            |
>>   2737        glxgears |█▍          ||            ||            ||            |
>>   2128           xfwm4 |            ||            ||            ||            |
>>   2047            Xorg |            ||            ||            ||            |
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>
>> Internally we track time spent on engines for each struct intel_context, both
>> for current and past contexts belonging to each open DRM file.
>>
>> This can serve as a building block for several features from the wanted list:
>> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
>> wanted by some customers, setrlimit(2) like controls, cgroups controller,
>> dynamic SSEU tuning, ...
>>
>> To enable userspace access to the tracked data, we expose time spent on GPU per
>> client and per engine class in sysfs with a hierarchy like the below:
>>
>>          # cd /sys/class/drm/card0/clients/
>>          # tree
>>          .
>>          ├── 7
>>          │   ├── busy
>>          │   │   ├── 0
>>          │   │   ├── 1
>>          │   │   ├── 2
>>          │   │   └── 3
>>          │   ├── name
>>          │   └── pid
>>          ├── 8
>>          │   ├── busy
>>          │   │   ├── 0
>>          │   │   ├── 1
>>          │   │   ├── 2
>>          │   │   └── 3
>>          │   ├── name
>>          │   └── pid
>>          └── 9
>>              ├── busy
>>              │   ├── 0
>>              │   ├── 1
>>              │   ├── 2
>>              │   └── 3
>>              ├── name
>>              └── pid
>>
>> Files in 'busy' directories are numbered using the engine class ABI values and
>> they contain accumulated nanoseconds each client spent on engines of a
>> respective class.
> 
> We did something similar in amdgpu using the gpu scheduler.  We then
> expose the data via fdinfo.  See
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Interesting!

Is yours wall time or actual GPU time taking preemption and such into 
account? Do you have some userspace tools parsing this data and how to 
do you client discovery? Presumably there has to be a better way that 
going through all open file descriptors?

Our implementation was merged in January but Daniel took it out recently 
because he wanted to have discussion about a common vendor framework for 
this whole story on dri-devel. I think. +Daniel to comment.

I couldn't find the patch you pasted on the mailing list to see if there 
was any such discussion around your version.

Regards,

Tvrtko

> 
> Alex
> 
> 
>>
>> Tvrtko Ursulin (7):
>>    drm/i915: Expose list of clients in sysfs
>>    drm/i915: Update client name on context create
>>    drm/i915: Make GEM contexts track DRM clients
>>    drm/i915: Track runtime spent in closed and unreachable GEM contexts
>>    drm/i915: Track all user contexts per client
>>    drm/i915: Track context current active time
>>    drm/i915: Expose per-engine client busyness
>>
>>   drivers/gpu/drm/i915/Makefile                 |   5 +-
>>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
>>   .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
>>   drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
>>   drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
>>   drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
>>   .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
>>   .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
>>   drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
>>   drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
>>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
>>   drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>>   drivers/gpu/drm/i915/i915_drv.c               |   6 +
>>   drivers/gpu/drm/i915/i915_drv.h               |   5 +
>>   drivers/gpu/drm/i915/i915_gem.c               |  21 +-
>>   drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
>>   drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
>>   drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
>>   19 files changed, 716 insertions(+), 81 deletions(-)
>>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>>
>> --
>> 2.30.2
>>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-13 16:40     ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-14  5:58       ` Alex Deucher
  -1 siblings, 0 replies; 103+ messages in thread
From: Alex Deucher @ 2021-05-14  5:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Christian Koenig
  Cc: Intel Graphics Development, Maling list - DRI developers

+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Resurrect of the previosuly merged per client engine busyness patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
> >>
> >>        IMC reads:     4414 MiB/s
> >>       IMC writes:     3805 MiB/s
> >>
> >>            ENGINE      BUSY                                      MI_SEMA MI_WAIT
> >>       Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
> >>         Blitter/0    0.00% |                                   |      0%      0%
> >>           Video/0    0.00% |                                   |      0%      0%
> >>    VideoEnhance/0    0.00% |                                   |      0%      0%
> >>
> >>    PID            NAME  Render/3D      Blitter        Video      VideoEnhance
> >>   2733       neverball |██████▌     ||            ||            ||            |
> >>   2047            Xorg |███▊        ||            ||            ||            |
> >>   2737        glxgears |█▍          ||            ||            ||            |
> >>   2128           xfwm4 |            ||            ||            ||            |
> >>   2047            Xorg |            ||            ||            ||            |
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> Internally we track time spent on engines for each struct intel_context, both
> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the wanted list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time spent on GPU per
> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>          # cd /sys/class/drm/card0/clients/
> >>          # tree
> >>          .
> >>          ├── 7
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          ├── 8
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          └── 9
> >>              ├── busy
> >>              │   ├── 0
> >>              │   ├── 1
> >>              │   ├── 2
> >>              │   └── 3
> >>              ├── name
> >>              └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class ABI values and
> >> they contain accumulated nanoseconds each client spent on engines of a
> >> respective class.
> >
> > We did something similar in amdgpu using the gpu scheduler.  We then
> > expose the data via fdinfo.  See
> > https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> > https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704
>
> Interesting!
>
> Is yours wall time or actual GPU time taking preemption and such into
> account? Do you have some userspace tools parsing this data and how to
> do you client discovery? Presumably there has to be a better way that
> going through all open file descriptors?

Wall time.  It uses the fences in the scheduler to calculate engine
time.  We have some python scripts to make it look pretty, but mainly
just reading the files directly.  If you know the process, you can
look it up in procfs.

>
> Our implementation was merged in January but Daniel took it out recently
> because he wanted to have discussion about a common vendor framework for
> this whole story on dri-devel. I think. +Daniel to comment.
>
> I couldn't find the patch you pasted on the mailing list to see if there
> was any such discussion around your version.

It was on the amd-gfx mailing list.

Alex

>
> Regards,
>
> Tvrtko
>
> >
> > Alex
> >
> >
> >>
> >> Tvrtko Ursulin (7):
> >>    drm/i915: Expose list of clients in sysfs
> >>    drm/i915: Update client name on context create
> >>    drm/i915: Make GEM contexts track DRM clients
> >>    drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >>    drm/i915: Track all user contexts per client
> >>    drm/i915: Track context current active time
> >>    drm/i915: Expose per-engine client busyness
> >>
> >>   drivers/gpu/drm/i915/Makefile                 |   5 +-
> >>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> >>   .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> >>   drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> >>   .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> >>   .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> >>   drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> >>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> >>   drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
> >>   drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> >>   drivers/gpu/drm/i915/i915_drv.c               |   6 +
> >>   drivers/gpu/drm/i915/i915_drv.h               |   5 +
> >>   drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> >>   drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> >>   19 files changed, 716 insertions(+), 81 deletions(-)
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> >>
> >> --
> >> 2.30.2
> >>

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14  5:58       ` Alex Deucher
  0 siblings, 0 replies; 103+ messages in thread
From: Alex Deucher @ 2021-05-14  5:58 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Christian Koenig
  Cc: Intel Graphics Development, Maling list - DRI developers

+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Resurrect of the previosuly merged per client engine busyness patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
> >>
> >>        IMC reads:     4414 MiB/s
> >>       IMC writes:     3805 MiB/s
> >>
> >>            ENGINE      BUSY                                      MI_SEMA MI_WAIT
> >>       Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
> >>         Blitter/0    0.00% |                                   |      0%      0%
> >>           Video/0    0.00% |                                   |      0%      0%
> >>    VideoEnhance/0    0.00% |                                   |      0%      0%
> >>
> >>    PID            NAME  Render/3D      Blitter        Video      VideoEnhance
> >>   2733       neverball |██████▌     ||            ||            ||            |
> >>   2047            Xorg |███▊        ||            ||            ||            |
> >>   2737        glxgears |█▍          ||            ||            ||            |
> >>   2128           xfwm4 |            ||            ||            ||            |
> >>   2047            Xorg |            ||            ||            ||            |
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> Internally we track time spent on engines for each struct intel_context, both
> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the wanted list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time spent on GPU per
> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>          # cd /sys/class/drm/card0/clients/
> >>          # tree
> >>          .
> >>          ├── 7
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          ├── 8
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          └── 9
> >>              ├── busy
> >>              │   ├── 0
> >>              │   ├── 1
> >>              │   ├── 2
> >>              │   └── 3
> >>              ├── name
> >>              └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class ABI values and
> >> they contain accumulated nanoseconds each client spent on engines of a
> >> respective class.
> >
> > We did something similar in amdgpu using the gpu scheduler.  We then
> > expose the data via fdinfo.  See
> > https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> > https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704
>
> Interesting!
>
> Is yours wall time or actual GPU time taking preemption and such into
> account? Do you have some userspace tools parsing this data and how to
> do you client discovery? Presumably there has to be a better way that
> going through all open file descriptors?

Wall time.  It uses the fences in the scheduler to calculate engine
time.  We have some python scripts to make it look pretty, but mainly
just reading the files directly.  If you know the process, you can
look it up in procfs.

>
> Our implementation was merged in January but Daniel took it out recently
> because he wanted to have discussion about a common vendor framework for
> this whole story on dri-devel. I think. +Daniel to comment.
>
> I couldn't find the patch you pasted on the mailing list to see if there
> was any such discussion around your version.

It was on the amd-gfx mailing list.

Alex

>
> Regards,
>
> Tvrtko
>
> >
> > Alex
> >
> >
> >>
> >> Tvrtko Ursulin (7):
> >>    drm/i915: Expose list of clients in sysfs
> >>    drm/i915: Update client name on context create
> >>    drm/i915: Make GEM contexts track DRM clients
> >>    drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >>    drm/i915: Track all user contexts per client
> >>    drm/i915: Track context current active time
> >>    drm/i915: Expose per-engine client busyness
> >>
> >>   drivers/gpu/drm/i915/Makefile                 |   5 +-
> >>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> >>   .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> >>   drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> >>   .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> >>   .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> >>   drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> >>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> >>   drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
> >>   drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> >>   drivers/gpu/drm/i915/i915_drv.c               |   6 +
> >>   drivers/gpu/drm/i915/i915_drv.h               |   5 +
> >>   drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> >>   drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> >>   19 files changed, 716 insertions(+), 81 deletions(-)
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> >>
> >> --
> >> 2.30.2
> >>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14  5:58       ` [Intel-gfx] " Alex Deucher
@ 2021-05-14  7:22         ` Nieto, David M
  -1 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-14  7:22 UTC (permalink / raw)
  To: Alex Deucher, Tvrtko Ursulin, Koenig, Christian
  Cc: Intel Graphics Development, Maling list - DRI developers

[-- Attachment #1: Type: text/plain, Size: 8384 bytes --]

[AMD Official Use Only - Internal Distribution Only]

We had entertained the idea of exposing the processes as sysfs nodes as you proposed, but we had concerns about exposing process info in there, especially since /proc already exists for that purpose.

I think if you were to follow that approach, we could have tools like top that support exposing GPU engine usage.
________________________________
From: Alex Deucher <alexdeucher@gmail.com>
Sent: Thursday, May 13, 2021 10:58 PM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 0/7] Per client engine busyness

+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Resurrect of the previosuly merged per client engine busyness patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
> >>
> >>        IMC reads:     4414 MiB/s
> >>       IMC writes:     3805 MiB/s
> >>
> >>            ENGINE      BUSY                                      MI_SEMA MI_WAIT
> >>       Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
> >>         Blitter/0    0.00% |                                   |      0%      0%
> >>           Video/0    0.00% |                                   |      0%      0%
> >>    VideoEnhance/0    0.00% |                                   |      0%      0%
> >>
> >>    PID            NAME  Render/3D      Blitter        Video      VideoEnhance
> >>   2733       neverball |██████▌     ||            ||            ||            |
> >>   2047            Xorg |███▊        ||            ||            ||            |
> >>   2737        glxgears |█▍          ||            ||            ||            |
> >>   2128           xfwm4 |            ||            ||            ||            |
> >>   2047            Xorg |            ||            ||            ||            |
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> Internally we track time spent on engines for each struct intel_context, both
> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the wanted list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time spent on GPU per
> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>          # cd /sys/class/drm/card0/clients/
> >>          # tree
> >>          .
> >>          ├── 7
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          ├── 8
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          └── 9
> >>              ├── busy
> >>              │   ├── 0
> >>              │   ├── 1
> >>              │   ├── 2
> >>              │   └── 3
> >>              ├── name
> >>              └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class ABI values and
> >> they contain accumulated nanoseconds each client spent on engines of a
> >> respective class.
> >
> > We did something similar in amdgpu using the gpu scheduler.  We then
> > expose the data via fdinfo.  See
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0
>
> Interesting!
>
> Is yours wall time or actual GPU time taking preemption and such into
> account? Do you have some userspace tools parsing this data and how to
> do you client discovery? Presumably there has to be a better way that
> going through all open file descriptors?

Wall time.  It uses the fences in the scheduler to calculate engine
time.  We have some python scripts to make it look pretty, but mainly
just reading the files directly.  If you know the process, you can
look it up in procfs.

>
> Our implementation was merged in January but Daniel took it out recently
> because he wanted to have discussion about a common vendor framework for
> this whole story on dri-devel. I think. +Daniel to comment.
>
> I couldn't find the patch you pasted on the mailing list to see if there
> was any such discussion around your version.

It was on the amd-gfx mailing list.

Alex

>
> Regards,
>
> Tvrtko
>
> >
> > Alex
> >
> >
> >>
> >> Tvrtko Ursulin (7):
> >>    drm/i915: Expose list of clients in sysfs
> >>    drm/i915: Update client name on context create
> >>    drm/i915: Make GEM contexts track DRM clients
> >>    drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >>    drm/i915: Track all user contexts per client
> >>    drm/i915: Track context current active time
> >>    drm/i915: Expose per-engine client busyness
> >>
> >>   drivers/gpu/drm/i915/Makefile                 |   5 +-
> >>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> >>   .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> >>   drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> >>   .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> >>   .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> >>   drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> >>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> >>   drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
> >>   drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> >>   drivers/gpu/drm/i915/i915_drv.c               |   6 +
> >>   drivers/gpu/drm/i915/i915_drv.h               |   5 +
> >>   drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> >>   drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> >>   19 files changed, 716 insertions(+), 81 deletions(-)
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> >>
> >> --
> >> 2.30.2
> >>

[-- Attachment #2: Type: text/html, Size: 17957 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14  7:22         ` Nieto, David M
  0 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-14  7:22 UTC (permalink / raw)
  To: Alex Deucher, Tvrtko Ursulin, Koenig, Christian
  Cc: Intel Graphics Development, Maling list - DRI developers


[-- Attachment #1.1: Type: text/plain, Size: 8384 bytes --]

[AMD Official Use Only - Internal Distribution Only]

We had entertained the idea of exposing the processes as sysfs nodes as you proposed, but we had concerns about exposing process info in there, especially since /proc already exists for that purpose.

I think if you were to follow that approach, we could have tools like top that support exposing GPU engine usage.
________________________________
From: Alex Deucher <alexdeucher@gmail.com>
Sent: Thursday, May 13, 2021 10:58 PM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 0/7] Per client engine busyness

+ David, Christian

On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> Hi,
>
> On 13/05/2021 16:48, Alex Deucher wrote:
> > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> > <tvrtko.ursulin@linux.intel.com> wrote:
> >>
> >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >> Resurrect of the previosuly merged per client engine busyness patches. In a
> >> nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> >> only physical GPU engine usage but per process view as well.
> >>
> >> Example screen capture:
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >> intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
> >>
> >>        IMC reads:     4414 MiB/s
> >>       IMC writes:     3805 MiB/s
> >>
> >>            ENGINE      BUSY                                      MI_SEMA MI_WAIT
> >>       Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
> >>         Blitter/0    0.00% |                                   |      0%      0%
> >>           Video/0    0.00% |                                   |      0%      0%
> >>    VideoEnhance/0    0.00% |                                   |      0%      0%
> >>
> >>    PID            NAME  Render/3D      Blitter        Video      VideoEnhance
> >>   2733       neverball |██████▌     ||            ||            ||            |
> >>   2047            Xorg |███▊        ||            ||            ||            |
> >>   2737        glxgears |█▍          ||            ||            ||            |
> >>   2128           xfwm4 |            ||            ||            ||            |
> >>   2047            Xorg |            ||            ||            ||            |
> >> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >>
> >> Internally we track time spent on engines for each struct intel_context, both
> >> for current and past contexts belonging to each open DRM file.
> >>
> >> This can serve as a building block for several features from the wanted list:
> >> smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> >> wanted by some customers, setrlimit(2) like controls, cgroups controller,
> >> dynamic SSEU tuning, ...
> >>
> >> To enable userspace access to the tracked data, we expose time spent on GPU per
> >> client and per engine class in sysfs with a hierarchy like the below:
> >>
> >>          # cd /sys/class/drm/card0/clients/
> >>          # tree
> >>          .
> >>          ├── 7
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          ├── 8
> >>          │   ├── busy
> >>          │   │   ├── 0
> >>          │   │   ├── 1
> >>          │   │   ├── 2
> >>          │   │   └── 3
> >>          │   ├── name
> >>          │   └── pid
> >>          └── 9
> >>              ├── busy
> >>              │   ├── 0
> >>              │   ├── 1
> >>              │   ├── 2
> >>              │   └── 3
> >>              ├── name
> >>              └── pid
> >>
> >> Files in 'busy' directories are numbered using the engine class ABI values and
> >> they contain accumulated nanoseconds each client spent on engines of a
> >> respective class.
> >
> > We did something similar in amdgpu using the gpu scheduler.  We then
> > expose the data via fdinfo.  See
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0
> > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0
>
> Interesting!
>
> Is yours wall time or actual GPU time taking preemption and such into
> account? Do you have some userspace tools parsing this data and how to
> do you client discovery? Presumably there has to be a better way that
> going through all open file descriptors?

Wall time.  It uses the fences in the scheduler to calculate engine
time.  We have some python scripts to make it look pretty, but mainly
just reading the files directly.  If you know the process, you can
look it up in procfs.

>
> Our implementation was merged in January but Daniel took it out recently
> because he wanted to have discussion about a common vendor framework for
> this whole story on dri-devel. I think. +Daniel to comment.
>
> I couldn't find the patch you pasted on the mailing list to see if there
> was any such discussion around your version.

It was on the amd-gfx mailing list.

Alex

>
> Regards,
>
> Tvrtko
>
> >
> > Alex
> >
> >
> >>
> >> Tvrtko Ursulin (7):
> >>    drm/i915: Expose list of clients in sysfs
> >>    drm/i915: Update client name on context create
> >>    drm/i915: Make GEM contexts track DRM clients
> >>    drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >>    drm/i915: Track all user contexts per client
> >>    drm/i915: Track context current active time
> >>    drm/i915: Expose per-engine client busyness
> >>
> >>   drivers/gpu/drm/i915/Makefile                 |   5 +-
> >>   drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> >>   .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> >>   drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> >>   .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> >>   .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> >>   drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> >>   drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> >>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> >>   drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
> >>   drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> >>   drivers/gpu/drm/i915/i915_drv.c               |   6 +
> >>   drivers/gpu/drm/i915/i915_drv.h               |   5 +
> >>   drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> >>   drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> >>   drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> >>   19 files changed, 716 insertions(+), 81 deletions(-)
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> >>
> >> --
> >> 2.30.2
> >>

[-- Attachment #1.2: Type: text/html, Size: 17957 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14  7:22         ` [Intel-gfx] " Nieto, David M
@ 2021-05-14  8:04           ` Christian König
  -1 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14  8:04 UTC (permalink / raw)
  To: Nieto, David M, Alex Deucher, Tvrtko Ursulin
  Cc: Intel Graphics Development, Maling list - DRI developers

[-- Attachment #1: Type: text/plain, Size: 11031 bytes --]

Well in my opinion exposing it through fdinfo turned out to be a really 
clean approach.

It describes exactly the per file descriptor information we need.

Making that device driver independent is potentially useful as well.

Regards,
Christian.

Am 14.05.21 um 09:22 schrieb Nieto, David M:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> We had entertained the idea of exposing the processes as sysfs nodes 
> as you proposed, but we had concerns about exposing process info in 
> there, especially since /proc already exists for that purpose.
>
> I think if you were to follow that approach, we could have tools like 
> top that support exposing GPU engine usage.
> ------------------------------------------------------------------------
> *From:* Alex Deucher <alexdeucher@gmail.com>
> *Sent:* Thursday, May 13, 2021 10:58 PM
> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M 
> <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
> *Cc:* Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; 
> Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel 
> Vetter <daniel@ffwll.ch>
> *Subject:* Re: [PATCH 0/7] Per client engine busyness
> + David, Christian
>
> On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > Hi,
> >
> > On 13/05/2021 16:48, Alex Deucher wrote:
> > > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> > > <tvrtko.ursulin@linux.intel.com> wrote:
> > >>
> > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > >>
> > >> Resurrect of the previosuly merged per client engine busyness 
> patches. In a
> > >> nutshell it enables intel_gpu_top to be more top(1) like useful 
> and show not
> > >> only physical GPU engine usage but per process view as well.
> > >>
> > >> Example screen capture:
> > >> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;      933 irqs/s
> > >>
> > >>        IMC reads:     4414 MiB/s
> > >>       IMC writes:     3805 MiB/s
> > >>
> > >>            ENGINE BUSY                                      
> MI_SEMA MI_WAIT
> > >>       Render/3D/0   93.46% |████████████████████████████████▋  
> |      0%      0%
> > >>         Blitter/0    0.00% |                                   
> |      0%      0%
> > >>           Video/0    0.00% |                                   
> |      0%      0%
> > >>    VideoEnhance/0    0.00% |                                   
> |      0%      0%
> > >>
> > >>    PID            NAME  Render/3D Blitter        Video      
> VideoEnhance
> > >>   2733       neverball |██████▌ ||            ||            
> ||            |
> > >>   2047            Xorg |███▊ ||            ||            
> ||            |
> > >>   2737        glxgears |█▍ ||            ||            
> ||            |
> > >>   2128           xfwm4 | ||            ||            ||            |
> > >>   2047            Xorg | ||            ||            ||            |
> > >> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >>
> > >> Internally we track time spent on engines for each struct 
> intel_context, both
> > >> for current and past contexts belonging to each open DRM file.
> > >>
> > >> This can serve as a building block for several features from the 
> wanted list:
> > >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
> functionality
> > >> wanted by some customers, setrlimit(2) like controls, cgroups 
> controller,
> > >> dynamic SSEU tuning, ...
> > >>
> > >> To enable userspace access to the tracked data, we expose time 
> spent on GPU per
> > >> client and per engine class in sysfs with a hierarchy like the below:
> > >>
> > >>          # cd /sys/class/drm/card0/clients/
> > >>          # tree
> > >>          .
> > >>          ├── 7
> > >>          │   ├── busy
> > >>          │   │   ├── 0
> > >>          │   │   ├── 1
> > >>          │   │   ├── 2
> > >>          │   │   └── 3
> > >>          │   ├── name
> > >>          │   └── pid
> > >>          ├── 8
> > >>          │   ├── busy
> > >>          │   │   ├── 0
> > >>          │   │   ├── 1
> > >>          │   │   ├── 2
> > >>          │   │   └── 3
> > >>          │   ├── name
> > >>          │   └── pid
> > >>          └── 9
> > >>              ├── busy
> > >>              │   ├── 0
> > >>              │   ├── 1
> > >>              │   ├── 2
> > >>              │   └── 3
> > >>              ├── name
> > >>              └── pid
> > >>
> > >> Files in 'busy' directories are numbered using the engine class 
> ABI values and
> > >> they contain accumulated nanoseconds each client spent on engines 
> of a
> > >> respective class.
> > >
> > > We did something similar in amdgpu using the gpu scheduler.  We then
> > > expose the data via fdinfo.  See
> > > 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0>
> > > 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0>
> >
> > Interesting!
> >
> > Is yours wall time or actual GPU time taking preemption and such into
> > account? Do you have some userspace tools parsing this data and how to
> > do you client discovery? Presumably there has to be a better way that
> > going through all open file descriptors?
>
> Wall time.  It uses the fences in the scheduler to calculate engine
> time.  We have some python scripts to make it look pretty, but mainly
> just reading the files directly.  If you know the process, you can
> look it up in procfs.
>
> >
> > Our implementation was merged in January but Daniel took it out recently
> > because he wanted to have discussion about a common vendor framework for
> > this whole story on dri-devel. I think. +Daniel to comment.
> >
> > I couldn't find the patch you pasted on the mailing list to see if there
> > was any such discussion around your version.
>
> It was on the amd-gfx mailing list.
>
> Alex
>
> >
> > Regards,
> >
> > Tvrtko
> >
> > >
> > > Alex
> > >
> > >
> > >>
> > >> Tvrtko Ursulin (7):
> > >>    drm/i915: Expose list of clients in sysfs
> > >>    drm/i915: Update client name on context create
> > >>    drm/i915: Make GEM contexts track DRM clients
> > >>    drm/i915: Track runtime spent in closed and unreachable GEM 
> contexts
> > >>    drm/i915: Track all user contexts per client
> > >>    drm/i915: Track context current active time
> > >>    drm/i915: Expose per-engine client busyness
> > >>
> > >> drivers/gpu/drm/i915/Makefile                 |   5 +-
> > >> drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> > >> .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> > >> drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> > >> drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> > >> drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> > >> .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> > >> .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> > >> drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> > >> drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> > >> drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> > >> drivers/gpu/drm/i915/i915_drm_client.c        | 365 
> ++++++++++++++++++
> > >> drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> > >> drivers/gpu/drm/i915/i915_drv.c               |   6 +
> > >> drivers/gpu/drm/i915/i915_drv.h               |   5 +
> > >> drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> > >> drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> > >> drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> > >> drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> > >>   19 files changed, 716 insertions(+), 81 deletions(-)
> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> > >>
> > >> --
> > >> 2.30.2
> > >>


[-- Attachment #2: Type: text/html, Size: 23274 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14  8:04           ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14  8:04 UTC (permalink / raw)
  To: Nieto, David M, Alex Deucher, Tvrtko Ursulin
  Cc: Intel Graphics Development, Maling list - DRI developers


[-- Attachment #1.1: Type: text/plain, Size: 11031 bytes --]

Well in my opinion exposing it through fdinfo turned out to be a really 
clean approach.

It describes exactly the per file descriptor information we need.

Making that device driver independent is potentially useful as well.

Regards,
Christian.

Am 14.05.21 um 09:22 schrieb Nieto, David M:
>
> [AMD Official Use Only - Internal Distribution Only]
>
>
> We had entertained the idea of exposing the processes as sysfs nodes 
> as you proposed, but we had concerns about exposing process info in 
> there, especially since /proc already exists for that purpose.
>
> I think if you were to follow that approach, we could have tools like 
> top that support exposing GPU engine usage.
> ------------------------------------------------------------------------
> *From:* Alex Deucher <alexdeucher@gmail.com>
> *Sent:* Thursday, May 13, 2021 10:58 PM
> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M 
> <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
> *Cc:* Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; 
> Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel 
> Vetter <daniel@ffwll.ch>
> *Subject:* Re: [PATCH 0/7] Per client engine busyness
> + David, Christian
>
> On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > Hi,
> >
> > On 13/05/2021 16:48, Alex Deucher wrote:
> > > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> > > <tvrtko.ursulin@linux.intel.com> wrote:
> > >>
> > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > >>
> > >> Resurrect of the previosuly merged per client engine busyness 
> patches. In a
> > >> nutshell it enables intel_gpu_top to be more top(1) like useful 
> and show not
> > >> only physical GPU engine usage but per process view as well.
> > >>
> > >> Example screen capture:
> > >> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;      933 irqs/s
> > >>
> > >>        IMC reads:     4414 MiB/s
> > >>       IMC writes:     3805 MiB/s
> > >>
> > >>            ENGINE BUSY                                      
> MI_SEMA MI_WAIT
> > >>       Render/3D/0   93.46% |████████████████████████████████▋  
> |      0%      0%
> > >>         Blitter/0    0.00% |                                   
> |      0%      0%
> > >>           Video/0    0.00% |                                   
> |      0%      0%
> > >>    VideoEnhance/0    0.00% |                                   
> |      0%      0%
> > >>
> > >>    PID            NAME  Render/3D Blitter        Video      
> VideoEnhance
> > >>   2733       neverball |██████▌ ||            ||            
> ||            |
> > >>   2047            Xorg |███▊ ||            ||            
> ||            |
> > >>   2737        glxgears |█▍ ||            ||            
> ||            |
> > >>   2128           xfwm4 | ||            ||            ||            |
> > >>   2047            Xorg | ||            ||            ||            |
> > >> 
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > >>
> > >> Internally we track time spent on engines for each struct 
> intel_context, both
> > >> for current and past contexts belonging to each open DRM file.
> > >>
> > >> This can serve as a building block for several features from the 
> wanted list:
> > >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
> functionality
> > >> wanted by some customers, setrlimit(2) like controls, cgroups 
> controller,
> > >> dynamic SSEU tuning, ...
> > >>
> > >> To enable userspace access to the tracked data, we expose time 
> spent on GPU per
> > >> client and per engine class in sysfs with a hierarchy like the below:
> > >>
> > >>          # cd /sys/class/drm/card0/clients/
> > >>          # tree
> > >>          .
> > >>          ├── 7
> > >>          │   ├── busy
> > >>          │   │   ├── 0
> > >>          │   │   ├── 1
> > >>          │   │   ├── 2
> > >>          │   │   └── 3
> > >>          │   ├── name
> > >>          │   └── pid
> > >>          ├── 8
> > >>          │   ├── busy
> > >>          │   │   ├── 0
> > >>          │   │   ├── 1
> > >>          │   │   ├── 2
> > >>          │   │   └── 3
> > >>          │   ├── name
> > >>          │   └── pid
> > >>          └── 9
> > >>              ├── busy
> > >>              │   ├── 0
> > >>              │   ├── 1
> > >>              │   ├── 2
> > >>              │   └── 3
> > >>              ├── name
> > >>              └── pid
> > >>
> > >> Files in 'busy' directories are numbered using the engine class 
> ABI values and
> > >> they contain accumulated nanoseconds each client spent on engines 
> of a
> > >> respective class.
> > >
> > > We did something similar in amdgpu using the gpu scheduler.  We then
> > > expose the data via fdinfo.  See
> > > 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0>
> > > 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0 
> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0>
> >
> > Interesting!
> >
> > Is yours wall time or actual GPU time taking preemption and such into
> > account? Do you have some userspace tools parsing this data and how to
> > do you client discovery? Presumably there has to be a better way that
> > going through all open file descriptors?
>
> Wall time.  It uses the fences in the scheduler to calculate engine
> time.  We have some python scripts to make it look pretty, but mainly
> just reading the files directly.  If you know the process, you can
> look it up in procfs.
>
> >
> > Our implementation was merged in January but Daniel took it out recently
> > because he wanted to have discussion about a common vendor framework for
> > this whole story on dri-devel. I think. +Daniel to comment.
> >
> > I couldn't find the patch you pasted on the mailing list to see if there
> > was any such discussion around your version.
>
> It was on the amd-gfx mailing list.
>
> Alex
>
> >
> > Regards,
> >
> > Tvrtko
> >
> > >
> > > Alex
> > >
> > >
> > >>
> > >> Tvrtko Ursulin (7):
> > >>    drm/i915: Expose list of clients in sysfs
> > >>    drm/i915: Update client name on context create
> > >>    drm/i915: Make GEM contexts track DRM clients
> > >>    drm/i915: Track runtime spent in closed and unreachable GEM 
> contexts
> > >>    drm/i915: Track all user contexts per client
> > >>    drm/i915: Track context current active time
> > >>    drm/i915: Expose per-engine client busyness
> > >>
> > >> drivers/gpu/drm/i915/Makefile                 |   5 +-
> > >> drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> > >> .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> > >> drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> > >> drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> > >> drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> > >> .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> > >> .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> > >> drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> > >> drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> > >> drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> > >> drivers/gpu/drm/i915/i915_drm_client.c        | 365 
> ++++++++++++++++++
> > >> drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> > >> drivers/gpu/drm/i915/i915_drv.c               |   6 +
> > >> drivers/gpu/drm/i915/i915_drv.h               |   5 +
> > >> drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> > >> drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> > >> drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> > >> drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> > >>   19 files changed, 716 insertions(+), 81 deletions(-)
> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> > >>
> > >> --
> > >> 2.30.2
> > >>


[-- Attachment #1.2: Type: text/html, Size: 23274 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14  8:04           ` [Intel-gfx] " Christian König
@ 2021-05-14 13:42             ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-14 13:42 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers


On 14/05/2021 09:04, Christian König wrote:
> Well in my opinion exposing it through fdinfo turned out to be a really 
> clean approach.
> 
> It describes exactly the per file descriptor information we need.

Yeah fdinfo certainly is mostly simple and neat.

I say mostly because main problem I see with it is discoverability. Alex 
commented in another sub-thread - "If you know the process, you can
look it up in procfs." - so that's fine for introspection but a bit 
challenging for a top(1) like tool.

David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs entry 
per process, but one per open drm fd.

Top level hierarchy is under /sys/class/drm/card0/clients/ and each 
opened drm fd gets a directory in there. Process data I expose there are 
the name and pid, but these are for convenience, not as a primary 
information.

But yes, I agree this part of the approach is definitely questionable. 
(As a side note, I am not sure if I could put a symlink to proc in 
there. I think sysfs and symlinks did not really work.)

Another data point is that this "client root" we think would be useful 
for adding other stuff in the future. For instance per client debug log 
stream is occasionally talked about.

> Making that device driver independent is potentially useful as well.

Alternative to my sysfs approach, the idea of exposing this in proc was 
floated by Chris in this series 
https://patchwork.freedesktop.org/series/86692/.

That would be generic enough so any GPU vendor can slot in, and common 
enough that GPU agnostic tools should be able to use it. Modulo some 
discussion around naming the "channels" (GPU engines) or not.

It wouldn't be able to support things like the before mentioned per 
client debug log stream but I guess that's not the most important thing. 
Most important would be allowing GPU usage to be wired to top(1) like 
tools which is probably even overdue given the modern computing landscape.

Would you guys be interested to give a more detailed look over both 
proposals and see if any would interest you?

Regards,

Tvrtko

> Regards,
> Christian.
> 
> Am 14.05.21 um 09:22 schrieb Nieto, David M:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>> We had entertained the idea of exposing the processes as sysfs nodes 
>> as you proposed, but we had concerns about exposing process info in 
>> there, especially since /proc already exists for that purpose.
>>
>> I think if you were to follow that approach, we could have tools like 
>> top that support exposing GPU engine usage.
>> ------------------------------------------------------------------------
>> *From:* Alex Deucher <alexdeucher@gmail.com>
>> *Sent:* Thursday, May 13, 2021 10:58 PM
>> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M 
>> <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
>> *Cc:* Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; 
>> Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel 
>> Vetter <daniel@ffwll.ch>
>> *Subject:* Re: [PATCH 0/7] Per client engine busyness
>> + David, Christian
>>
>> On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>> >
>> >
>> > Hi,
>> >
>> > On 13/05/2021 16:48, Alex Deucher wrote:
>> > > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
>> > > <tvrtko.ursulin@linux.intel.com> wrote:
>> > >>
>> > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > >>
>> > >> Resurrect of the previosuly merged per client engine busyness 
>> patches. In a
>> > >> nutshell it enables intel_gpu_top to be more top(1) like useful 
>> and show not
>> > >> only physical GPU engine usage but per process view as well.
>> > >>
>> > >> Example screen capture:
>> > >> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;      933 irqs/s
>> > >>
>> > >>        IMC reads:     4414 MiB/s
>> > >>       IMC writes:     3805 MiB/s
>> > >>
>> > >>            ENGINE BUSY                                      
>> MI_SEMA MI_WAIT
>> > >>       Render/3D/0   93.46% |████████████████████████████████▋  
>> |      0%      0%
>> > >>         Blitter/0    0.00% |                                   
>> |      0%      0%
>> > >>           Video/0    0.00% |                                   
>> |      0%      0%
>> > >>    VideoEnhance/0    0.00% |                                   
>> |      0%      0%
>> > >>
>> > >>    PID            NAME  Render/3D Blitter        Video      
>> VideoEnhance
>> > >>   2733       neverball |██████▌ ||            ||            
>> ||            |
>> > >>   2047            Xorg |███▊ ||            ||            
>> ||            |
>> > >>   2737        glxgears |█▍ ||            ||            
>> ||            |
>> > >>   2128           xfwm4 | ||            ||            ||            |
>> > >>   2047            Xorg | ||            ||            ||            |
>> > >> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > >>
>> > >> Internally we track time spent on engines for each struct 
>> intel_context, both
>> > >> for current and past contexts belonging to each open DRM file.
>> > >>
>> > >> This can serve as a building block for several features from the 
>> wanted list:
>> > >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
>> functionality
>> > >> wanted by some customers, setrlimit(2) like controls, cgroups 
>> controller,
>> > >> dynamic SSEU tuning, ...
>> > >>
>> > >> To enable userspace access to the tracked data, we expose time 
>> spent on GPU per
>> > >> client and per engine class in sysfs with a hierarchy like the below:
>> > >>
>> > >>          # cd /sys/class/drm/card0/clients/
>> > >>          # tree
>> > >>          .
>> > >>          ├── 7
>> > >>          │   ├── busy
>> > >>          │   │   ├── 0
>> > >>          │   │   ├── 1
>> > >>          │   │   ├── 2
>> > >>          │   │   └── 3
>> > >>          │   ├── name
>> > >>          │   └── pid
>> > >>          ├── 8
>> > >>          │   ├── busy
>> > >>          │   │   ├── 0
>> > >>          │   │   ├── 1
>> > >>          │   │   ├── 2
>> > >>          │   │   └── 3
>> > >>          │   ├── name
>> > >>          │   └── pid
>> > >>          └── 9
>> > >>              ├── busy
>> > >>              │   ├── 0
>> > >>              │   ├── 1
>> > >>              │   ├── 2
>> > >>              │   └── 3
>> > >>              ├── name
>> > >>              └── pid
>> > >>
>> > >> Files in 'busy' directories are numbered using the engine class 
>> ABI values and
>> > >> they contain accumulated nanoseconds each client spent on engines 
>> of a
>> > >> respective class.
>> > >
>> > > We did something similar in amdgpu using the gpu scheduler.  We then
>> > > expose the data via fdinfo.  See
>> > > 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0 
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0>
>> > > 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0 
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0>
>> >
>> > Interesting!
>> >
>> > Is yours wall time or actual GPU time taking preemption and such into
>> > account? Do you have some userspace tools parsing this data and how to
>> > do you client discovery? Presumably there has to be a better way that
>> > going through all open file descriptors?
>>
>> Wall time.  It uses the fences in the scheduler to calculate engine
>> time.  We have some python scripts to make it look pretty, but mainly
>> just reading the files directly.  If you know the process, you can
>> look it up in procfs.
>>
>> >
>> > Our implementation was merged in January but Daniel took it out recently
>> > because he wanted to have discussion about a common vendor framework for
>> > this whole story on dri-devel. I think. +Daniel to comment.
>> >
>> > I couldn't find the patch you pasted on the mailing list to see if there
>> > was any such discussion around your version.
>>
>> It was on the amd-gfx mailing list.
>>
>> Alex
>>
>> >
>> > Regards,
>> >
>> > Tvrtko
>> >
>> > >
>> > > Alex
>> > >
>> > >
>> > >>
>> > >> Tvrtko Ursulin (7):
>> > >>    drm/i915: Expose list of clients in sysfs
>> > >>    drm/i915: Update client name on context create
>> > >>    drm/i915: Make GEM contexts track DRM clients
>> > >>    drm/i915: Track runtime spent in closed and unreachable GEM 
>> contexts
>> > >>    drm/i915: Track all user contexts per client
>> > >>    drm/i915: Track context current active time
>> > >>    drm/i915: Expose per-engine client busyness
>> > >>
>> > >> drivers/gpu/drm/i915/Makefile                 |   5 +-
>> > >> drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
>> > >> .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
>> > >> drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
>> > >> drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
>> > >> drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
>> > >> .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
>> > >> .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
>> > >> drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
>> > >> drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
>> > >> drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
>> > >> drivers/gpu/drm/i915/i915_drm_client.c        | 365 
>> ++++++++++++++++++
>> > >> drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>> > >> drivers/gpu/drm/i915/i915_drv.c               |   6 +
>> > >> drivers/gpu/drm/i915/i915_drv.h               |   5 +
>> > >> drivers/gpu/drm/i915/i915_gem.c               |  21 +-
>> > >> drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
>> > >> drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
>> > >> drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
>> > >>   19 files changed, 716 insertions(+), 81 deletions(-)
>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>> > >>
>> > >> --
>> > >> 2.30.2
>> > >>
> 

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14 13:42             ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-14 13:42 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers


On 14/05/2021 09:04, Christian König wrote:
> Well in my opinion exposing it through fdinfo turned out to be a really 
> clean approach.
> 
> It describes exactly the per file descriptor information we need.

Yeah fdinfo certainly is mostly simple and neat.

I say mostly because main problem I see with it is discoverability. Alex 
commented in another sub-thread - "If you know the process, you can
look it up in procfs." - so that's fine for introspection but a bit 
challenging for a top(1) like tool.

David also said that you considered sysfs but were wary of exposing 
process info in there. To clarify, my patch is not exposing sysfs entry 
per process, but one per open drm fd.

Top level hierarchy is under /sys/class/drm/card0/clients/ and each 
opened drm fd gets a directory in there. Process data I expose there are 
the name and pid, but these are for convenience, not as a primary 
information.

But yes, I agree this part of the approach is definitely questionable. 
(As a side note, I am not sure if I could put a symlink to proc in 
there. I think sysfs and symlinks did not really work.)

Another data point is that this "client root" we think would be useful 
for adding other stuff in the future. For instance per client debug log 
stream is occasionally talked about.

> Making that device driver independent is potentially useful as well.

Alternative to my sysfs approach, the idea of exposing this in proc was 
floated by Chris in this series 
https://patchwork.freedesktop.org/series/86692/.

That would be generic enough so any GPU vendor can slot in, and common 
enough that GPU agnostic tools should be able to use it. Modulo some 
discussion around naming the "channels" (GPU engines) or not.

It wouldn't be able to support things like the before mentioned per 
client debug log stream but I guess that's not the most important thing. 
Most important would be allowing GPU usage to be wired to top(1) like 
tools which is probably even overdue given the modern computing landscape.

Would you guys be interested to give a more detailed look over both 
proposals and see if any would interest you?

Regards,

Tvrtko

> Regards,
> Christian.
> 
> Am 14.05.21 um 09:22 schrieb Nieto, David M:
>>
>> [AMD Official Use Only - Internal Distribution Only]
>>
>>
>> We had entertained the idea of exposing the processes as sysfs nodes 
>> as you proposed, but we had concerns about exposing process info in 
>> there, especially since /proc already exists for that purpose.
>>
>> I think if you were to follow that approach, we could have tools like 
>> top that support exposing GPU engine usage.
>> ------------------------------------------------------------------------
>> *From:* Alex Deucher <alexdeucher@gmail.com>
>> *Sent:* Thursday, May 13, 2021 10:58 PM
>> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M 
>> <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
>> *Cc:* Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; 
>> Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel 
>> Vetter <daniel@ffwll.ch>
>> *Subject:* Re: [PATCH 0/7] Per client engine busyness
>> + David, Christian
>>
>> On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>> >
>> >
>> > Hi,
>> >
>> > On 13/05/2021 16:48, Alex Deucher wrote:
>> > > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
>> > > <tvrtko.ursulin@linux.intel.com> wrote:
>> > >>
>> > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> > >>
>> > >> Resurrect of the previosuly merged per client engine busyness 
>> patches. In a
>> > >> nutshell it enables intel_gpu_top to be more top(1) like useful 
>> and show not
>> > >> only physical GPU engine usage but per process view as well.
>> > >>
>> > >> Example screen capture:
>> > >> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;      933 irqs/s
>> > >>
>> > >>        IMC reads:     4414 MiB/s
>> > >>       IMC writes:     3805 MiB/s
>> > >>
>> > >>            ENGINE BUSY                                      
>> MI_SEMA MI_WAIT
>> > >>       Render/3D/0   93.46% |████████████████████████████████▋  
>> |      0%      0%
>> > >>         Blitter/0    0.00% |                                   
>> |      0%      0%
>> > >>           Video/0    0.00% |                                   
>> |      0%      0%
>> > >>    VideoEnhance/0    0.00% |                                   
>> |      0%      0%
>> > >>
>> > >>    PID            NAME  Render/3D Blitter        Video      
>> VideoEnhance
>> > >>   2733       neverball |██████▌ ||            ||            
>> ||            |
>> > >>   2047            Xorg |███▊ ||            ||            
>> ||            |
>> > >>   2737        glxgears |█▍ ||            ||            
>> ||            |
>> > >>   2128           xfwm4 | ||            ||            ||            |
>> > >>   2047            Xorg | ||            ||            ||            |
>> > >> 
>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>> > >>
>> > >> Internally we track time spent on engines for each struct 
>> intel_context, both
>> > >> for current and past contexts belonging to each open DRM file.
>> > >>
>> > >> This can serve as a building block for several features from the 
>> wanted list:
>> > >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
>> functionality
>> > >> wanted by some customers, setrlimit(2) like controls, cgroups 
>> controller,
>> > >> dynamic SSEU tuning, ...
>> > >>
>> > >> To enable userspace access to the tracked data, we expose time 
>> spent on GPU per
>> > >> client and per engine class in sysfs with a hierarchy like the below:
>> > >>
>> > >>          # cd /sys/class/drm/card0/clients/
>> > >>          # tree
>> > >>          .
>> > >>          ├── 7
>> > >>          │   ├── busy
>> > >>          │   │   ├── 0
>> > >>          │   │   ├── 1
>> > >>          │   │   ├── 2
>> > >>          │   │   └── 3
>> > >>          │   ├── name
>> > >>          │   └── pid
>> > >>          ├── 8
>> > >>          │   ├── busy
>> > >>          │   │   ├── 0
>> > >>          │   │   ├── 1
>> > >>          │   │   ├── 2
>> > >>          │   │   └── 3
>> > >>          │   ├── name
>> > >>          │   └── pid
>> > >>          └── 9
>> > >>              ├── busy
>> > >>              │   ├── 0
>> > >>              │   ├── 1
>> > >>              │   ├── 2
>> > >>              │   └── 3
>> > >>              ├── name
>> > >>              └── pid
>> > >>
>> > >> Files in 'busy' directories are numbered using the engine class 
>> ABI values and
>> > >> they contain accumulated nanoseconds each client spent on engines 
>> of a
>> > >> respective class.
>> > >
>> > > We did something similar in amdgpu using the gpu scheduler.  We then
>> > > expose the data via fdinfo.  See
>> > > 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0 
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=mt1EIL%2Fc9pHCXR%2FYSd%2BTr1e64XHoeYcdQ2cYufJ%2FcYQ%3D&amp;reserved=0>
>> > > 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0 
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C5e3c05578ef14be3692508d9169d55bf%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565687273144615%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=%2F3zMGw0LPTC1kG4NebTwUPTx7QCtEyw%2B4JToXDK5QXI%3D&amp;reserved=0>
>> >
>> > Interesting!
>> >
>> > Is yours wall time or actual GPU time taking preemption and such into
>> > account? Do you have some userspace tools parsing this data and how to
>> > do you client discovery? Presumably there has to be a better way that
>> > going through all open file descriptors?
>>
>> Wall time.  It uses the fences in the scheduler to calculate engine
>> time.  We have some python scripts to make it look pretty, but mainly
>> just reading the files directly.  If you know the process, you can
>> look it up in procfs.
>>
>> >
>> > Our implementation was merged in January but Daniel took it out recently
>> > because he wanted to have discussion about a common vendor framework for
>> > this whole story on dri-devel. I think. +Daniel to comment.
>> >
>> > I couldn't find the patch you pasted on the mailing list to see if there
>> > was any such discussion around your version.
>>
>> It was on the amd-gfx mailing list.
>>
>> Alex
>>
>> >
>> > Regards,
>> >
>> > Tvrtko
>> >
>> > >
>> > > Alex
>> > >
>> > >
>> > >>
>> > >> Tvrtko Ursulin (7):
>> > >>    drm/i915: Expose list of clients in sysfs
>> > >>    drm/i915: Update client name on context create
>> > >>    drm/i915: Make GEM contexts track DRM clients
>> > >>    drm/i915: Track runtime spent in closed and unreachable GEM 
>> contexts
>> > >>    drm/i915: Track all user contexts per client
>> > >>    drm/i915: Track context current active time
>> > >>    drm/i915: Expose per-engine client busyness
>> > >>
>> > >> drivers/gpu/drm/i915/Makefile                 |   5 +-
>> > >> drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
>> > >> .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
>> > >> drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
>> > >> drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
>> > >> drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
>> > >> .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
>> > >> .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
>> > >> drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
>> > >> drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
>> > >> drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
>> > >> drivers/gpu/drm/i915/i915_drm_client.c        | 365 
>> ++++++++++++++++++
>> > >> drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>> > >> drivers/gpu/drm/i915/i915_drv.c               |   6 +
>> > >> drivers/gpu/drm/i915/i915_drv.h               |   5 +
>> > >> drivers/gpu/drm/i915/i915_gem.c               |  21 +-
>> > >> drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
>> > >> drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
>> > >> drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
>> > >>   19 files changed, 716 insertions(+), 81 deletions(-)
>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>> > >>
>> > >> --
>> > >> 2.30.2
>> > >>
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14 13:42             ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-14 13:53               ` Christian König
  -1 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14 13:53 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers

>
> David also said that you considered sysfs but were wary of exposing 
> process info in there. To clarify, my patch is not exposing sysfs 
> entry per process, but one per open drm fd.
>

Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. Just 
tracking who opened it first like DRM does is pretty useless on modern 
systems.

But an "lsof /dev/dri/renderD128" for example does exactly what top does 
as well, it iterates over /proc and sees which process has that file open.

So even with sysfs aid for discovery you are back to just going over all 
files again.

Regards,
Christian.

Am 14.05.21 um 15:42 schrieb Tvrtko Ursulin:
>
> On 14/05/2021 09:04, Christian König wrote:
>> Well in my opinion exposing it through fdinfo turned out to be a 
>> really clean approach.
>>
>> It describes exactly the per file descriptor information we need.
>
> Yeah fdinfo certainly is mostly simple and neat.
>
> I say mostly because main problem I see with it is discoverability. 
> Alex commented in another sub-thread - "If you know the process, you can
> look it up in procfs." - so that's fine for introspection but a bit 
> challenging for a top(1) like tool.
>
> David also said that you considered sysfs but were wary of exposing 
> process info in there. To clarify, my patch is not exposing sysfs 
> entry per process, but one per open drm fd.
>
> Top level hierarchy is under /sys/class/drm/card0/clients/ and each 
> opened drm fd gets a directory in there. Process data I expose there 
> are the name and pid, but these are for convenience, not as a primary 
> information.
>
> But yes, I agree this part of the approach is definitely questionable. 
> (As a side note, I am not sure if I could put a symlink to proc in 
> there. I think sysfs and symlinks did not really work.)
>
> Another data point is that this "client root" we think would be useful 
> for adding other stuff in the future. For instance per client debug 
> log stream is occasionally talked about.
>
>> Making that device driver independent is potentially useful as well.
>
> Alternative to my sysfs approach, the idea of exposing this in proc 
> was floated by Chris in this series 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F86692%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JsARknf2q%2FwtGfgLM6ZUOSkaivV%2B6yakpqYh%2B6yQlEc%3D&amp;reserved=0.
>
> That would be generic enough so any GPU vendor can slot in, and common 
> enough that GPU agnostic tools should be able to use it. Modulo some 
> discussion around naming the "channels" (GPU engines) or not.
>
> It wouldn't be able to support things like the before mentioned per 
> client debug log stream but I guess that's not the most important 
> thing. Most important would be allowing GPU usage to be wired to 
> top(1) like tools which is probably even overdue given the modern 
> computing landscape.
>
> Would you guys be interested to give a more detailed look over both 
> proposals and see if any would interest you?
>
> Regards,
>
> Tvrtko
>
>> Regards,
>> Christian.
>>
>> Am 14.05.21 um 09:22 schrieb Nieto, David M:
>>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>>
>>> We had entertained the idea of exposing the processes as sysfs nodes 
>>> as you proposed, but we had concerns about exposing process info in 
>>> there, especially since /proc already exists for that purpose.
>>>
>>> I think if you were to follow that approach, we could have tools 
>>> like top that support exposing GPU engine usage.
>>> ------------------------------------------------------------------------ 
>>>
>>> *From:* Alex Deucher <alexdeucher@gmail.com>
>>> *Sent:* Thursday, May 13, 2021 10:58 PM
>>> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David 
>>> M <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
>>> *Cc:* Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; 
>>> Maling list - DRI developers <dri-devel@lists.freedesktop.org>; 
>>> Daniel Vetter <daniel@ffwll.ch>
>>> *Subject:* Re: [PATCH 0/7] Per client engine busyness
>>> + David, Christian
>>>
>>> On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> >
>>> >
>>> > Hi,
>>> >
>>> > On 13/05/2021 16:48, Alex Deucher wrote:
>>> > > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
>>> > > <tvrtko.ursulin@linux.intel.com> wrote:
>>> > >>
>>> > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> > >>
>>> > >> Resurrect of the previosuly merged per client engine busyness 
>>> patches. In a
>>> > >> nutshell it enables intel_gpu_top to be more top(1) like useful 
>>> and show not
>>> > >> only physical GPU engine usage but per process view as well.
>>> > >>
>>> > >> Example screen capture:
>>> > >> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> > >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;      933 
>>> irqs/s
>>> > >>
>>> > >>        IMC reads:     4414 MiB/s
>>> > >>       IMC writes:     3805 MiB/s
>>> > >>
>>> > >>            ENGINE BUSY                                      
>>> MI_SEMA MI_WAIT
>>> > >>       Render/3D/0   93.46% |████████████████████████████████▋  
>>> |      0%      0%
>>> > >>         Blitter/0    0.00% |                                   
>>> |      0%      0%
>>> > >>           Video/0    0.00% |                                   
>>> |      0%      0%
>>> > >>    VideoEnhance/0    0.00% |                                   
>>> |      0%      0%
>>> > >>
>>> > >>    PID            NAME  Render/3D Blitter Video      VideoEnhance
>>> > >>   2733       neverball |██████▌ || ||            ||            |
>>> > >>   2047            Xorg |███▊ || ||            ||            |
>>> > >>   2737        glxgears |█▍ || ||            ||            |
>>> > >>   2128           xfwm4 | || ||            ||            |
>>> > >>   2047            Xorg | || ||            ||            |
>>> > >> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> > >>
>>> > >> Internally we track time spent on engines for each struct 
>>> intel_context, both
>>> > >> for current and past contexts belonging to each open DRM file.
>>> > >>
>>> > >> This can serve as a building block for several features from 
>>> the wanted list:
>>> > >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
>>> functionality
>>> > >> wanted by some customers, setrlimit(2) like controls, cgroups 
>>> controller,
>>> > >> dynamic SSEU tuning, ...
>>> > >>
>>> > >> To enable userspace access to the tracked data, we expose time 
>>> spent on GPU per
>>> > >> client and per engine class in sysfs with a hierarchy like the 
>>> below:
>>> > >>
>>> > >>          # cd /sys/class/drm/card0/clients/
>>> > >>          # tree
>>> > >>          .
>>> > >>          ├── 7
>>> > >>          │   ├── busy
>>> > >>          │   │   ├── 0
>>> > >>          │   │   ├── 1
>>> > >>          │   │   ├── 2
>>> > >>          │   │   └── 3
>>> > >>          │   ├── name
>>> > >>          │   └── pid
>>> > >>          ├── 8
>>> > >>          │   ├── busy
>>> > >>          │   │   ├── 0
>>> > >>          │   │   ├── 1
>>> > >>          │   │   ├── 2
>>> > >>          │   │   └── 3
>>> > >>          │   ├── name
>>> > >>          │   └── pid
>>> > >>          └── 9
>>> > >>              ├── busy
>>> > >>              │   ├── 0
>>> > >>              │   ├── 1
>>> > >>              │   ├── 2
>>> > >>              │   └── 3
>>> > >>              ├── name
>>> > >>              └── pid
>>> > >>
>>> > >> Files in 'busy' directories are numbered using the engine class 
>>> ABI values and
>>> > >> they contain accumulated nanoseconds each client spent on 
>>> engines of a
>>> > >> respective class.
>>> > >
>>> > > We did something similar in amdgpu using the gpu scheduler.  We 
>>> then
>>> > > expose the data via fdinfo.  See
>>> > > 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=68CUMt7ykOfG4toK8RcLlPnkic%2BNPaagkuMNUWM656w%3D&amp;reserved=0 
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=68CUMt7ykOfG4toK8RcLlPnkic%2BNPaagkuMNUWM656w%3D&amp;reserved=0> 
>>>
>>> > > 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1hBQYDiHkjLdJ6Xa7UmUDnPQk67YQ1tbfFfZ6jJZqk8%3D&amp;reserved=0 
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1hBQYDiHkjLdJ6Xa7UmUDnPQk67YQ1tbfFfZ6jJZqk8%3D&amp;reserved=0> 
>>>
>>> >
>>> > Interesting!
>>> >
>>> > Is yours wall time or actual GPU time taking preemption and such into
>>> > account? Do you have some userspace tools parsing this data and 
>>> how to
>>> > do you client discovery? Presumably there has to be a better way that
>>> > going through all open file descriptors?
>>>
>>> Wall time.  It uses the fences in the scheduler to calculate engine
>>> time.  We have some python scripts to make it look pretty, but mainly
>>> just reading the files directly.  If you know the process, you can
>>> look it up in procfs.
>>>
>>> >
>>> > Our implementation was merged in January but Daniel took it out 
>>> recently
>>> > because he wanted to have discussion about a common vendor 
>>> framework for
>>> > this whole story on dri-devel. I think. +Daniel to comment.
>>> >
>>> > I couldn't find the patch you pasted on the mailing list to see if 
>>> there
>>> > was any such discussion around your version.
>>>
>>> It was on the amd-gfx mailing list.
>>>
>>> Alex
>>>
>>> >
>>> > Regards,
>>> >
>>> > Tvrtko
>>> >
>>> > >
>>> > > Alex
>>> > >
>>> > >
>>> > >>
>>> > >> Tvrtko Ursulin (7):
>>> > >>    drm/i915: Expose list of clients in sysfs
>>> > >>    drm/i915: Update client name on context create
>>> > >>    drm/i915: Make GEM contexts track DRM clients
>>> > >>    drm/i915: Track runtime spent in closed and unreachable GEM 
>>> contexts
>>> > >>    drm/i915: Track all user contexts per client
>>> > >>    drm/i915: Track context current active time
>>> > >>    drm/i915: Expose per-engine client busyness
>>> > >>
>>> > >> drivers/gpu/drm/i915/Makefile |   5 +-
>>> > >> drivers/gpu/drm/i915/gem/i915_gem_context.c   | 61 ++-
>>> > >> .../gpu/drm/i915/gem/i915_gem_context_types.h | 16 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_context.c       | 27 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_context.h       | 15 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_context_types.h | 24 +-
>>> > >> .../drm/i915/gt/intel_execlists_submission.c  | 23 +-
>>> > >> .../gpu/drm/i915/gt/intel_gt_clock_utils.c |   4 +
>>> > >> drivers/gpu/drm/i915/gt/intel_lrc.c           | 27 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_lrc.h           | 24 ++
>>> > >> drivers/gpu/drm/i915/gt/selftest_lrc.c        | 10 +-
>>> > >> drivers/gpu/drm/i915/i915_drm_client.c        | 365 
>>> ++++++++++++++++++
>>> > >> drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>>> > >> drivers/gpu/drm/i915/i915_drv.c |   6 +
>>> > >> drivers/gpu/drm/i915/i915_drv.h |   5 +
>>> > >> drivers/gpu/drm/i915/i915_gem.c               | 21 +-
>>> > >> drivers/gpu/drm/i915/i915_gpu_error.c         | 31 +-
>>> > >> drivers/gpu/drm/i915/i915_gpu_error.h |   2 +-
>>> > >> drivers/gpu/drm/i915/i915_sysfs.c |   8 +
>>> > >>   19 files changed, 716 insertions(+), 81 deletions(-)
>>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>>> > >>
>>> > >> --
>>> > >> 2.30.2
>>> > >>
>>


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14 13:53               ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14 13:53 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers

>
> David also said that you considered sysfs but were wary of exposing 
> process info in there. To clarify, my patch is not exposing sysfs 
> entry per process, but one per open drm fd.
>

Yes, we discussed this as well, but then rejected the approach.

To have useful information related to the open drm fd you need to 
related that to process(es) which have that file descriptor open. Just 
tracking who opened it first like DRM does is pretty useless on modern 
systems.

But an "lsof /dev/dri/renderD128" for example does exactly what top does 
as well, it iterates over /proc and sees which process has that file open.

So even with sysfs aid for discovery you are back to just going over all 
files again.

Regards,
Christian.

Am 14.05.21 um 15:42 schrieb Tvrtko Ursulin:
>
> On 14/05/2021 09:04, Christian König wrote:
>> Well in my opinion exposing it through fdinfo turned out to be a 
>> really clean approach.
>>
>> It describes exactly the per file descriptor information we need.
>
> Yeah fdinfo certainly is mostly simple and neat.
>
> I say mostly because main problem I see with it is discoverability. 
> Alex commented in another sub-thread - "If you know the process, you can
> look it up in procfs." - so that's fine for introspection but a bit 
> challenging for a top(1) like tool.
>
> David also said that you considered sysfs but were wary of exposing 
> process info in there. To clarify, my patch is not exposing sysfs 
> entry per process, but one per open drm fd.
>
> Top level hierarchy is under /sys/class/drm/card0/clients/ and each 
> opened drm fd gets a directory in there. Process data I expose there 
> are the name and pid, but these are for convenience, not as a primary 
> information.
>
> But yes, I agree this part of the approach is definitely questionable. 
> (As a side note, I am not sure if I could put a symlink to proc in 
> there. I think sysfs and symlinks did not really work.)
>
> Another data point is that this "client root" we think would be useful 
> for adding other stuff in the future. For instance per client debug 
> log stream is occasionally talked about.
>
>> Making that device driver independent is potentially useful as well.
>
> Alternative to my sysfs approach, the idea of exposing this in proc 
> was floated by Chris in this series 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fseries%2F86692%2F&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=JsARknf2q%2FwtGfgLM6ZUOSkaivV%2B6yakpqYh%2B6yQlEc%3D&amp;reserved=0.
>
> That would be generic enough so any GPU vendor can slot in, and common 
> enough that GPU agnostic tools should be able to use it. Modulo some 
> discussion around naming the "channels" (GPU engines) or not.
>
> It wouldn't be able to support things like the before mentioned per 
> client debug log stream but I guess that's not the most important 
> thing. Most important would be allowing GPU usage to be wired to 
> top(1) like tools which is probably even overdue given the modern 
> computing landscape.
>
> Would you guys be interested to give a more detailed look over both 
> proposals and see if any would interest you?
>
> Regards,
>
> Tvrtko
>
>> Regards,
>> Christian.
>>
>> Am 14.05.21 um 09:22 schrieb Nieto, David M:
>>>
>>> [AMD Official Use Only - Internal Distribution Only]
>>>
>>>
>>> We had entertained the idea of exposing the processes as sysfs nodes 
>>> as you proposed, but we had concerns about exposing process info in 
>>> there, especially since /proc already exists for that purpose.
>>>
>>> I think if you were to follow that approach, we could have tools 
>>> like top that support exposing GPU engine usage.
>>> ------------------------------------------------------------------------ 
>>>
>>> *From:* Alex Deucher <alexdeucher@gmail.com>
>>> *Sent:* Thursday, May 13, 2021 10:58 PM
>>> *To:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David 
>>> M <David.Nieto@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
>>> *Cc:* Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; 
>>> Maling list - DRI developers <dri-devel@lists.freedesktop.org>; 
>>> Daniel Vetter <daniel@ffwll.ch>
>>> *Subject:* Re: [PATCH 0/7] Per client engine busyness
>>> + David, Christian
>>>
>>> On Thu, May 13, 2021 at 12:41 PM Tvrtko Ursulin
>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> >
>>> >
>>> > Hi,
>>> >
>>> > On 13/05/2021 16:48, Alex Deucher wrote:
>>> > > On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
>>> > > <tvrtko.ursulin@linux.intel.com> wrote:
>>> > >>
>>> > >> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> > >>
>>> > >> Resurrect of the previosuly merged per client engine busyness 
>>> patches. In a
>>> > >> nutshell it enables intel_gpu_top to be more top(1) like useful 
>>> and show not
>>> > >> only physical GPU engine usage but per process view as well.
>>> > >>
>>> > >> Example screen capture:
>>> > >> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> > >> intel-gpu-top -  906/ 955 MHz;    0% RC6; 5.30 Watts;      933 
>>> irqs/s
>>> > >>
>>> > >>        IMC reads:     4414 MiB/s
>>> > >>       IMC writes:     3805 MiB/s
>>> > >>
>>> > >>            ENGINE BUSY                                      
>>> MI_SEMA MI_WAIT
>>> > >>       Render/3D/0   93.46% |████████████████████████████████▋  
>>> |      0%      0%
>>> > >>         Blitter/0    0.00% |                                   
>>> |      0%      0%
>>> > >>           Video/0    0.00% |                                   
>>> |      0%      0%
>>> > >>    VideoEnhance/0    0.00% |                                   
>>> |      0%      0%
>>> > >>
>>> > >>    PID            NAME  Render/3D Blitter Video      VideoEnhance
>>> > >>   2733       neverball |██████▌ || ||            ||            |
>>> > >>   2047            Xorg |███▊ || ||            ||            |
>>> > >>   2737        glxgears |█▍ || ||            ||            |
>>> > >>   2128           xfwm4 | || ||            ||            |
>>> > >>   2047            Xorg | || ||            ||            |
>>> > >> 
>>> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>>> > >>
>>> > >> Internally we track time spent on engines for each struct 
>>> intel_context, both
>>> > >> for current and past contexts belonging to each open DRM file.
>>> > >>
>>> > >> This can serve as a building block for several features from 
>>> the wanted list:
>>> > >> smarter scheduler decisions, getrusage(2)-like per-GEM-context 
>>> functionality
>>> > >> wanted by some customers, setrlimit(2) like controls, cgroups 
>>> controller,
>>> > >> dynamic SSEU tuning, ...
>>> > >>
>>> > >> To enable userspace access to the tracked data, we expose time 
>>> spent on GPU per
>>> > >> client and per engine class in sysfs with a hierarchy like the 
>>> below:
>>> > >>
>>> > >>          # cd /sys/class/drm/card0/clients/
>>> > >>          # tree
>>> > >>          .
>>> > >>          ├── 7
>>> > >>          │   ├── busy
>>> > >>          │   │   ├── 0
>>> > >>          │   │   ├── 1
>>> > >>          │   │   ├── 2
>>> > >>          │   │   └── 3
>>> > >>          │   ├── name
>>> > >>          │   └── pid
>>> > >>          ├── 8
>>> > >>          │   ├── busy
>>> > >>          │   │   ├── 0
>>> > >>          │   │   ├── 1
>>> > >>          │   │   ├── 2
>>> > >>          │   │   └── 3
>>> > >>          │   ├── name
>>> > >>          │   └── pid
>>> > >>          └── 9
>>> > >>              ├── busy
>>> > >>              │   ├── 0
>>> > >>              │   ├── 1
>>> > >>              │   ├── 2
>>> > >>              │   └── 3
>>> > >>              ├── name
>>> > >>              └── pid
>>> > >>
>>> > >> Files in 'busy' directories are numbered using the engine class 
>>> ABI values and
>>> > >> they contain accumulated nanoseconds each client spent on 
>>> engines of a
>>> > >> respective class.
>>> > >
>>> > > We did something similar in amdgpu using the gpu scheduler.  We 
>>> then
>>> > > expose the data via fdinfo.  See
>>> > > 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=68CUMt7ykOfG4toK8RcLlPnkic%2BNPaagkuMNUWM656w%3D&amp;reserved=0 
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D1774baa64f9395fa884ea9ed494bcb043f3b83f5&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=68CUMt7ykOfG4toK8RcLlPnkic%2BNPaagkuMNUWM656w%3D&amp;reserved=0> 
>>>
>>> > > 
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1hBQYDiHkjLdJ6Xa7UmUDnPQk67YQ1tbfFfZ6jJZqk8%3D&amp;reserved=0 
>>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm-misc%2Fcommit%2F%3Fid%3D874442541133f78c78b6880b8cc495bab5c61704&amp;data=04%7C01%7Cchristian.koenig%40amd.com%7C69e215ff2b434ba36ecc08d916de1754%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637565965394961215%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=1hBQYDiHkjLdJ6Xa7UmUDnPQk67YQ1tbfFfZ6jJZqk8%3D&amp;reserved=0> 
>>>
>>> >
>>> > Interesting!
>>> >
>>> > Is yours wall time or actual GPU time taking preemption and such into
>>> > account? Do you have some userspace tools parsing this data and 
>>> how to
>>> > do you client discovery? Presumably there has to be a better way that
>>> > going through all open file descriptors?
>>>
>>> Wall time.  It uses the fences in the scheduler to calculate engine
>>> time.  We have some python scripts to make it look pretty, but mainly
>>> just reading the files directly.  If you know the process, you can
>>> look it up in procfs.
>>>
>>> >
>>> > Our implementation was merged in January but Daniel took it out 
>>> recently
>>> > because he wanted to have discussion about a common vendor 
>>> framework for
>>> > this whole story on dri-devel. I think. +Daniel to comment.
>>> >
>>> > I couldn't find the patch you pasted on the mailing list to see if 
>>> there
>>> > was any such discussion around your version.
>>>
>>> It was on the amd-gfx mailing list.
>>>
>>> Alex
>>>
>>> >
>>> > Regards,
>>> >
>>> > Tvrtko
>>> >
>>> > >
>>> > > Alex
>>> > >
>>> > >
>>> > >>
>>> > >> Tvrtko Ursulin (7):
>>> > >>    drm/i915: Expose list of clients in sysfs
>>> > >>    drm/i915: Update client name on context create
>>> > >>    drm/i915: Make GEM contexts track DRM clients
>>> > >>    drm/i915: Track runtime spent in closed and unreachable GEM 
>>> contexts
>>> > >>    drm/i915: Track all user contexts per client
>>> > >>    drm/i915: Track context current active time
>>> > >>    drm/i915: Expose per-engine client busyness
>>> > >>
>>> > >> drivers/gpu/drm/i915/Makefile |   5 +-
>>> > >> drivers/gpu/drm/i915/gem/i915_gem_context.c   | 61 ++-
>>> > >> .../gpu/drm/i915/gem/i915_gem_context_types.h | 16 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_context.c       | 27 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_context.h       | 15 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_context_types.h | 24 +-
>>> > >> .../drm/i915/gt/intel_execlists_submission.c  | 23 +-
>>> > >> .../gpu/drm/i915/gt/intel_gt_clock_utils.c |   4 +
>>> > >> drivers/gpu/drm/i915/gt/intel_lrc.c           | 27 +-
>>> > >> drivers/gpu/drm/i915/gt/intel_lrc.h           | 24 ++
>>> > >> drivers/gpu/drm/i915/gt/selftest_lrc.c        | 10 +-
>>> > >> drivers/gpu/drm/i915/i915_drm_client.c        | 365 
>>> ++++++++++++++++++
>>> > >> drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
>>> > >> drivers/gpu/drm/i915/i915_drv.c |   6 +
>>> > >> drivers/gpu/drm/i915/i915_drv.h |   5 +
>>> > >> drivers/gpu/drm/i915/i915_gem.c               | 21 +-
>>> > >> drivers/gpu/drm/i915/i915_gpu_error.c         | 31 +-
>>> > >> drivers/gpu/drm/i915/i915_gpu_error.h |   2 +-
>>> > >> drivers/gpu/drm/i915/i915_sysfs.c |   8 +
>>> > >>   19 files changed, 716 insertions(+), 81 deletions(-)
>>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
>>> > >>   create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
>>> > >>
>>> > >> --
>>> > >> 2.30.2
>>> > >>
>>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14 13:53               ` [Intel-gfx] " Christian König
@ 2021-05-14 14:47                 ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-14 14:47 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers


On 14/05/2021 14:53, Christian König wrote:
>>
>> David also said that you considered sysfs but were wary of exposing 
>> process info in there. To clarify, my patch is not exposing sysfs 
>> entry per process, but one per open drm fd.
>>
> 
> Yes, we discussed this as well, but then rejected the approach.
> 
> To have useful information related to the open drm fd you need to 
> related that to process(es) which have that file descriptor open. Just 
> tracking who opened it first like DRM does is pretty useless on modern 
> systems.

We do update the pid/name for fds passed over unix sockets.

> But an "lsof /dev/dri/renderD128" for example does exactly what top does 
> as well, it iterates over /proc and sees which process has that file open.

Lsof is quite inefficient for this use case. It has to open _all_ open 
files for _all_ processes on the system to find a handful of ones which 
may have the DRM device open.

> So even with sysfs aid for discovery you are back to just going over all 
> files again.

For what use case?

To enable GPU usage in top we can do much better than iterate over all 
open files in the system. We can start with a process if going with the 
/proc proposal, or with the opened DRM file directly with the sysfs 
proposal. Both are significantly fewer than total number of open files 
across all processes.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14 14:47                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-14 14:47 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers


On 14/05/2021 14:53, Christian König wrote:
>>
>> David also said that you considered sysfs but were wary of exposing 
>> process info in there. To clarify, my patch is not exposing sysfs 
>> entry per process, but one per open drm fd.
>>
> 
> Yes, we discussed this as well, but then rejected the approach.
> 
> To have useful information related to the open drm fd you need to 
> related that to process(es) which have that file descriptor open. Just 
> tracking who opened it first like DRM does is pretty useless on modern 
> systems.

We do update the pid/name for fds passed over unix sockets.

> But an "lsof /dev/dri/renderD128" for example does exactly what top does 
> as well, it iterates over /proc and sees which process has that file open.

Lsof is quite inefficient for this use case. It has to open _all_ open 
files for _all_ processes on the system to find a handful of ones which 
may have the DRM device open.

> So even with sysfs aid for discovery you are back to just going over all 
> files again.

For what use case?

To enable GPU usage in top we can do much better than iterate over all 
open files in the system. We can start with a process if going with the 
/proc proposal, or with the opened DRM file directly with the sysfs 
proposal. Both are significantly fewer than total number of open files 
across all processes.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14 14:47                 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-14 14:56                   ` Christian König
  -1 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14 14:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers

Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>
> On 14/05/2021 14:53, Christian König wrote:
>>>
>>> David also said that you considered sysfs but were wary of exposing 
>>> process info in there. To clarify, my patch is not exposing sysfs 
>>> entry per process, but one per open drm fd.
>>>
>>
>> Yes, we discussed this as well, but then rejected the approach.
>>
>> To have useful information related to the open drm fd you need to 
>> related that to process(es) which have that file descriptor open. 
>> Just tracking who opened it first like DRM does is pretty useless on 
>> modern systems.
>
> We do update the pid/name for fds passed over unix sockets.

Well I just double checked and that is not correct.

Could be that i915 has some special code for that, but on my laptop I 
only see the X server under the "clients" debugfs file.

>> But an "lsof /dev/dri/renderD128" for example does exactly what top 
>> does as well, it iterates over /proc and sees which process has that 
>> file open.
>
> Lsof is quite inefficient for this use case. It has to open _all_ open 
> files for _all_ processes on the system to find a handful of ones 
> which may have the DRM device open.

Completely agree.

The key point is you either need to have all references to an open fd, 
or at least track whoever last used that fd.

At least the last time I looked even the fs layer didn't know which fd 
is open by which process. So there wasn't really any alternative to the 
lsof approach.

Regards,
Christian.

>
>> So even with sysfs aid for discovery you are back to just going over 
>> all files again.
>
> For what use case?
>
> To enable GPU usage in top we can do much better than iterate over all 
> open files in the system. We can start with a process if going with 
> the /proc proposal, or with the opened DRM file directly with the 
> sysfs proposal. Both are significantly fewer than total number of open 
> files across all processes.
>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14 14:56                   ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14 14:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers

Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>
> On 14/05/2021 14:53, Christian König wrote:
>>>
>>> David also said that you considered sysfs but were wary of exposing 
>>> process info in there. To clarify, my patch is not exposing sysfs 
>>> entry per process, but one per open drm fd.
>>>
>>
>> Yes, we discussed this as well, but then rejected the approach.
>>
>> To have useful information related to the open drm fd you need to 
>> related that to process(es) which have that file descriptor open. 
>> Just tracking who opened it first like DRM does is pretty useless on 
>> modern systems.
>
> We do update the pid/name for fds passed over unix sockets.

Well I just double checked and that is not correct.

Could be that i915 has some special code for that, but on my laptop I 
only see the X server under the "clients" debugfs file.

>> But an "lsof /dev/dri/renderD128" for example does exactly what top 
>> does as well, it iterates over /proc and sees which process has that 
>> file open.
>
> Lsof is quite inefficient for this use case. It has to open _all_ open 
> files for _all_ processes on the system to find a handful of ones 
> which may have the DRM device open.

Completely agree.

The key point is you either need to have all references to an open fd, 
or at least track whoever last used that fd.

At least the last time I looked even the fs layer didn't know which fd 
is open by which process. So there wasn't really any alternative to the 
lsof approach.

Regards,
Christian.

>
>> So even with sysfs aid for discovery you are back to just going over 
>> all files again.
>
> For what use case?
>
> To enable GPU usage in top we can do much better than iterate over all 
> open files in the system. We can start with a process if going with 
> the /proc proposal, or with the opened DRM file directly with the 
> sysfs proposal. Both are significantly fewer than total number of open 
> files across all processes.
>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14 14:56                   ` [Intel-gfx] " Christian König
@ 2021-05-14 15:03                     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-14 15:03 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers


On 14/05/2021 15:56, Christian König wrote:
> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>>
>> On 14/05/2021 14:53, Christian König wrote:
>>>>
>>>> David also said that you considered sysfs but were wary of exposing 
>>>> process info in there. To clarify, my patch is not exposing sysfs 
>>>> entry per process, but one per open drm fd.
>>>>
>>>
>>> Yes, we discussed this as well, but then rejected the approach.
>>>
>>> To have useful information related to the open drm fd you need to 
>>> related that to process(es) which have that file descriptor open. 
>>> Just tracking who opened it first like DRM does is pretty useless on 
>>> modern systems.
>>
>> We do update the pid/name for fds passed over unix sockets.
> 
> Well I just double checked and that is not correct.
> 
> Could be that i915 has some special code for that, but on my laptop I 
> only see the X server under the "clients" debugfs file.

Yes we have special code in i915 for this. Part of this series we are 
discussing here.

>>> But an "lsof /dev/dri/renderD128" for example does exactly what top 
>>> does as well, it iterates over /proc and sees which process has that 
>>> file open.
>>
>> Lsof is quite inefficient for this use case. It has to open _all_ open 
>> files for _all_ processes on the system to find a handful of ones 
>> which may have the DRM device open.
> 
> Completely agree.
> 
> The key point is you either need to have all references to an open fd, 
> or at least track whoever last used that fd.
> 
> At least the last time I looked even the fs layer didn't know which fd 
> is open by which process. So there wasn't really any alternative to the 
> lsof approach.

I asked you about the use case you have in mind which you did not 
answer. Otherwise I don't understand when do you need to walk all files. 
What information you want to get?

For the use case of knowing which DRM file is using how much GPU time on 
engine X we do not need to walk all open files either with my sysfs 
approach or the proc approach from Chris. (In the former case we 
optionally aggregate by PID at presentation time, and in the latter case 
aggregation is implicit.)

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14 15:03                     ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-14 15:03 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers


On 14/05/2021 15:56, Christian König wrote:
> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>>
>> On 14/05/2021 14:53, Christian König wrote:
>>>>
>>>> David also said that you considered sysfs but were wary of exposing 
>>>> process info in there. To clarify, my patch is not exposing sysfs 
>>>> entry per process, but one per open drm fd.
>>>>
>>>
>>> Yes, we discussed this as well, but then rejected the approach.
>>>
>>> To have useful information related to the open drm fd you need to 
>>> related that to process(es) which have that file descriptor open. 
>>> Just tracking who opened it first like DRM does is pretty useless on 
>>> modern systems.
>>
>> We do update the pid/name for fds passed over unix sockets.
> 
> Well I just double checked and that is not correct.
> 
> Could be that i915 has some special code for that, but on my laptop I 
> only see the X server under the "clients" debugfs file.

Yes we have special code in i915 for this. Part of this series we are 
discussing here.

>>> But an "lsof /dev/dri/renderD128" for example does exactly what top 
>>> does as well, it iterates over /proc and sees which process has that 
>>> file open.
>>
>> Lsof is quite inefficient for this use case. It has to open _all_ open 
>> files for _all_ processes on the system to find a handful of ones 
>> which may have the DRM device open.
> 
> Completely agree.
> 
> The key point is you either need to have all references to an open fd, 
> or at least track whoever last used that fd.
> 
> At least the last time I looked even the fs layer didn't know which fd 
> is open by which process. So there wasn't really any alternative to the 
> lsof approach.

I asked you about the use case you have in mind which you did not 
answer. Otherwise I don't understand when do you need to walk all files. 
What information you want to get?

For the use case of knowing which DRM file is using how much GPU time on 
engine X we do not need to walk all open files either with my sysfs 
approach or the proc approach from Chris. (In the former case we 
optionally aggregate by PID at presentation time, and in the latter case 
aggregation is implicit.)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14 15:03                     ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-14 15:10                       ` Christian König
  -1 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14 15:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers

Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
>
> On 14/05/2021 15:56, Christian König wrote:
>> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>>>
>>> On 14/05/2021 14:53, Christian König wrote:
>>>>>
>>>>> David also said that you considered sysfs but were wary of 
>>>>> exposing process info in there. To clarify, my patch is not 
>>>>> exposing sysfs entry per process, but one per open drm fd.
>>>>>
>>>>
>>>> Yes, we discussed this as well, but then rejected the approach.
>>>>
>>>> To have useful information related to the open drm fd you need to 
>>>> related that to process(es) which have that file descriptor open. 
>>>> Just tracking who opened it first like DRM does is pretty useless 
>>>> on modern systems.
>>>
>>> We do update the pid/name for fds passed over unix sockets.
>>
>> Well I just double checked and that is not correct.
>>
>> Could be that i915 has some special code for that, but on my laptop I 
>> only see the X server under the "clients" debugfs file.
>
> Yes we have special code in i915 for this. Part of this series we are 
> discussing here.

Ah, yeah you should mention that. Could we please separate that into 
common code instead? Cause I really see that as a bug in the current 
handling independent of the discussion here.

As far as I know all IOCTLs go though some common place in DRM anyway.

>>>> But an "lsof /dev/dri/renderD128" for example does exactly what top 
>>>> does as well, it iterates over /proc and sees which process has 
>>>> that file open.
>>>
>>> Lsof is quite inefficient for this use case. It has to open _all_ 
>>> open files for _all_ processes on the system to find a handful of 
>>> ones which may have the DRM device open.
>>
>> Completely agree.
>>
>> The key point is you either need to have all references to an open 
>> fd, or at least track whoever last used that fd.
>>
>> At least the last time I looked even the fs layer didn't know which 
>> fd is open by which process. So there wasn't really any alternative 
>> to the lsof approach.
>
> I asked you about the use case you have in mind which you did not 
> answer. Otherwise I don't understand when do you need to walk all 
> files. What information you want to get?

Per fd debugging information, e.g. instead of the top use case you know 
which process you want to look at.

>
> For the use case of knowing which DRM file is using how much GPU time 
> on engine X we do not need to walk all open files either with my sysfs 
> approach or the proc approach from Chris. (In the former case we 
> optionally aggregate by PID at presentation time, and in the latter 
> case aggregation is implicit.)

I'm unsure if we should go with the sysfs, proc or some completely 
different approach.

In general it would be nice to have a way to find all the fd references 
for an open inode.

Regards,
Christian.

>
> Regards,
>
> Tvrtko


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-14 15:10                       ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-14 15:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers

Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
>
> On 14/05/2021 15:56, Christian König wrote:
>> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>>>
>>> On 14/05/2021 14:53, Christian König wrote:
>>>>>
>>>>> David also said that you considered sysfs but were wary of 
>>>>> exposing process info in there. To clarify, my patch is not 
>>>>> exposing sysfs entry per process, but one per open drm fd.
>>>>>
>>>>
>>>> Yes, we discussed this as well, but then rejected the approach.
>>>>
>>>> To have useful information related to the open drm fd you need to 
>>>> related that to process(es) which have that file descriptor open. 
>>>> Just tracking who opened it first like DRM does is pretty useless 
>>>> on modern systems.
>>>
>>> We do update the pid/name for fds passed over unix sockets.
>>
>> Well I just double checked and that is not correct.
>>
>> Could be that i915 has some special code for that, but on my laptop I 
>> only see the X server under the "clients" debugfs file.
>
> Yes we have special code in i915 for this. Part of this series we are 
> discussing here.

Ah, yeah you should mention that. Could we please separate that into 
common code instead? Cause I really see that as a bug in the current 
handling independent of the discussion here.

As far as I know all IOCTLs go though some common place in DRM anyway.

>>>> But an "lsof /dev/dri/renderD128" for example does exactly what top 
>>>> does as well, it iterates over /proc and sees which process has 
>>>> that file open.
>>>
>>> Lsof is quite inefficient for this use case. It has to open _all_ 
>>> open files for _all_ processes on the system to find a handful of 
>>> ones which may have the DRM device open.
>>
>> Completely agree.
>>
>> The key point is you either need to have all references to an open 
>> fd, or at least track whoever last used that fd.
>>
>> At least the last time I looked even the fs layer didn't know which 
>> fd is open by which process. So there wasn't really any alternative 
>> to the lsof approach.
>
> I asked you about the use case you have in mind which you did not 
> answer. Otherwise I don't understand when do you need to walk all 
> files. What information you want to get?

Per fd debugging information, e.g. instead of the top use case you know 
which process you want to look at.

>
> For the use case of knowing which DRM file is using how much GPU time 
> on engine X we do not need to walk all open files either with my sysfs 
> approach or the proc approach from Chris. (In the former case we 
> optionally aggregate by PID at presentation time, and in the latter 
> case aggregation is implicit.)

I'm unsure if we should go with the sysfs, proc or some completely 
different approach.

In general it would be nice to have a way to find all the fd references 
for an open inode.

Regards,
Christian.

>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-14 15:03                     ` [Intel-gfx] " Tvrtko Ursulin
  (?)
  (?)
@ 2021-05-15 10:40                     ` Maxime Schmitt
  2021-05-17 16:13                       ` Tvrtko Ursulin
  -1 siblings, 1 reply; 103+ messages in thread
From: Maxime Schmitt @ 2021-05-15 10:40 UTC (permalink / raw)
  To: intel-gfx

Hi,

Nice to see something like this being worked on.

I wrote a top-like tool some time back (nvtop).
I targeted NVIDIA, because it was the GPU I had at the time. Also,
their driver provides a nice library to retrieve the information from
(NVML).

Seeing this thread I think it would be nice to support more vendors now
that the information is available.

I took a look at the DRM documentation, but I am only finding the in-
kernel functions and not what is being exposed to user space. Maybe I
am searching at the wrong place.
Is there some documentation, from the user space point of view, on the
way to discover the GPUs and the metrics that are exposed?

Regards,
Maxime



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-13 15:48   ` [Intel-gfx] " Alex Deucher
@ 2021-05-17 14:20     ` Daniel Vetter
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-17 14:20 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Tvrtko Ursulin, Intel Graphics Development, Maling list - DRI developers

On Thu, May 13, 2021 at 11:48:08AM -0400, Alex Deucher wrote:
> On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > Resurrect of the previosuly merged per client engine busyness patches. In a
> > nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> > only physical GPU engine usage but per process view as well.
> >
> > Example screen capture:
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
> >
> >       IMC reads:     4414 MiB/s
> >      IMC writes:     3805 MiB/s
> >
> >           ENGINE      BUSY                                      MI_SEMA MI_WAIT
> >      Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
> >        Blitter/0    0.00% |                                   |      0%      0%
> >          Video/0    0.00% |                                   |      0%      0%
> >   VideoEnhance/0    0.00% |                                   |      0%      0%
> >
> >   PID            NAME  Render/3D      Blitter        Video      VideoEnhance
> >  2733       neverball |██████▌     ||            ||            ||            |
> >  2047            Xorg |███▊        ||            ||            ||            |
> >  2737        glxgears |█▍          ||            ||            ||            |
> >  2128           xfwm4 |            ||            ||            ||            |
> >  2047            Xorg |            ||            ||            ||            |
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Internally we track time spent on engines for each struct intel_context, both
> > for current and past contexts belonging to each open DRM file.
> >
> > This can serve as a building block for several features from the wanted list:
> > smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> > wanted by some customers, setrlimit(2) like controls, cgroups controller,
> > dynamic SSEU tuning, ...
> >
> > To enable userspace access to the tracked data, we expose time spent on GPU per
> > client and per engine class in sysfs with a hierarchy like the below:
> >
> >         # cd /sys/class/drm/card0/clients/
> >         # tree
> >         .
> >         ├── 7
> >         │   ├── busy
> >         │   │   ├── 0
> >         │   │   ├── 1
> >         │   │   ├── 2
> >         │   │   └── 3
> >         │   ├── name
> >         │   └── pid
> >         ├── 8
> >         │   ├── busy
> >         │   │   ├── 0
> >         │   │   ├── 1
> >         │   │   ├── 2
> >         │   │   └── 3
> >         │   ├── name
> >         │   └── pid
> >         └── 9
> >             ├── busy
> >             │   ├── 0
> >             │   ├── 1
> >             │   ├── 2
> >             │   └── 3
> >             ├── name
> >             └── pid
> >
> > Files in 'busy' directories are numbered using the engine class ABI values and
> > they contain accumulated nanoseconds each client spent on engines of a
> > respective class.
> 
> We did something similar in amdgpu using the gpu scheduler.  We then
> expose the data via fdinfo.  See
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Yeah the reason I've dropped these patches was because they looked like
prime material for at least a bit of standardization across drivers.

Also fdinfo sounds like very good interface for these, I didn't even know
that's doable. Might also be interesting to even standardize the fdinfo
stuff across drivers.

Also since drm/i915 will adopt drm/scheduler, we could build that on top
of that code too. So no restrictions there from i915 side.

Anyway discussion kicked off, I'll let yout figure out what we'll do here.
-Daniel

> 
> Alex
> 
> 
> >
> > Tvrtko Ursulin (7):
> >   drm/i915: Expose list of clients in sysfs
> >   drm/i915: Update client name on context create
> >   drm/i915: Make GEM contexts track DRM clients
> >   drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >   drm/i915: Track all user contexts per client
> >   drm/i915: Track context current active time
> >   drm/i915: Expose per-engine client busyness
> >
> >  drivers/gpu/drm/i915/Makefile                 |   5 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> >  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> >  drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> >  drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> >  .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> >  .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> >  drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> >  drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> >  drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> >  drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> >  drivers/gpu/drm/i915/i915_drv.c               |   6 +
> >  drivers/gpu/drm/i915/i915_drv.h               |   5 +
> >  drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> >  drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> >  drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> >  drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> >  19 files changed, 716 insertions(+), 81 deletions(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> >  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> >
> > --
> > 2.30.2
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 14:20     ` Daniel Vetter
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-17 14:20 UTC (permalink / raw)
  To: Alex Deucher; +Cc: Intel Graphics Development, Maling list - DRI developers

On Thu, May 13, 2021 at 11:48:08AM -0400, Alex Deucher wrote:
> On Thu, May 13, 2021 at 7:00 AM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >
> > Resurrect of the previosuly merged per client engine busyness patches. In a
> > nutshell it enables intel_gpu_top to be more top(1) like useful and show not
> > only physical GPU engine usage but per process view as well.
> >
> > Example screen capture:
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> > intel-gpu-top -  906/ 955 MHz;    0% RC6;  5.30 Watts;      933 irqs/s
> >
> >       IMC reads:     4414 MiB/s
> >      IMC writes:     3805 MiB/s
> >
> >           ENGINE      BUSY                                      MI_SEMA MI_WAIT
> >      Render/3D/0   93.46% |████████████████████████████████▋  |      0%      0%
> >        Blitter/0    0.00% |                                   |      0%      0%
> >          Video/0    0.00% |                                   |      0%      0%
> >   VideoEnhance/0    0.00% |                                   |      0%      0%
> >
> >   PID            NAME  Render/3D      Blitter        Video      VideoEnhance
> >  2733       neverball |██████▌     ||            ||            ||            |
> >  2047            Xorg |███▊        ||            ||            ||            |
> >  2737        glxgears |█▍          ||            ||            ||            |
> >  2128           xfwm4 |            ||            ||            ||            |
> >  2047            Xorg |            ||            ||            ||            |
> > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> >
> > Internally we track time spent on engines for each struct intel_context, both
> > for current and past contexts belonging to each open DRM file.
> >
> > This can serve as a building block for several features from the wanted list:
> > smarter scheduler decisions, getrusage(2)-like per-GEM-context functionality
> > wanted by some customers, setrlimit(2) like controls, cgroups controller,
> > dynamic SSEU tuning, ...
> >
> > To enable userspace access to the tracked data, we expose time spent on GPU per
> > client and per engine class in sysfs with a hierarchy like the below:
> >
> >         # cd /sys/class/drm/card0/clients/
> >         # tree
> >         .
> >         ├── 7
> >         │   ├── busy
> >         │   │   ├── 0
> >         │   │   ├── 1
> >         │   │   ├── 2
> >         │   │   └── 3
> >         │   ├── name
> >         │   └── pid
> >         ├── 8
> >         │   ├── busy
> >         │   │   ├── 0
> >         │   │   ├── 1
> >         │   │   ├── 2
> >         │   │   └── 3
> >         │   ├── name
> >         │   └── pid
> >         └── 9
> >             ├── busy
> >             │   ├── 0
> >             │   ├── 1
> >             │   ├── 2
> >             │   └── 3
> >             ├── name
> >             └── pid
> >
> > Files in 'busy' directories are numbered using the engine class ABI values and
> > they contain accumulated nanoseconds each client spent on engines of a
> > respective class.
> 
> We did something similar in amdgpu using the gpu scheduler.  We then
> expose the data via fdinfo.  See
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=1774baa64f9395fa884ea9ed494bcb043f3b83f5
> https://cgit.freedesktop.org/drm/drm-misc/commit/?id=874442541133f78c78b6880b8cc495bab5c61704

Yeah the reason I've dropped these patches was because they looked like
prime material for at least a bit of standardization across drivers.

Also fdinfo sounds like very good interface for these, I didn't even know
that's doable. Might also be interesting to even standardize the fdinfo
stuff across drivers.

Also since drm/i915 will adopt drm/scheduler, we could build that on top
of that code too. So no restrictions there from i915 side.

Anyway discussion kicked off, I'll let yout figure out what we'll do here.
-Daniel

> 
> Alex
> 
> 
> >
> > Tvrtko Ursulin (7):
> >   drm/i915: Expose list of clients in sysfs
> >   drm/i915: Update client name on context create
> >   drm/i915: Make GEM contexts track DRM clients
> >   drm/i915: Track runtime spent in closed and unreachable GEM contexts
> >   drm/i915: Track all user contexts per client
> >   drm/i915: Track context current active time
> >   drm/i915: Expose per-engine client busyness
> >
> >  drivers/gpu/drm/i915/Makefile                 |   5 +-
> >  drivers/gpu/drm/i915/gem/i915_gem_context.c   |  61 ++-
> >  .../gpu/drm/i915/gem/i915_gem_context_types.h |  16 +-
> >  drivers/gpu/drm/i915/gt/intel_context.c       |  27 +-
> >  drivers/gpu/drm/i915/gt/intel_context.h       |  15 +-
> >  drivers/gpu/drm/i915/gt/intel_context_types.h |  24 +-
> >  .../drm/i915/gt/intel_execlists_submission.c  |  23 +-
> >  .../gpu/drm/i915/gt/intel_gt_clock_utils.c    |   4 +
> >  drivers/gpu/drm/i915/gt/intel_lrc.c           |  27 +-
> >  drivers/gpu/drm/i915/gt/intel_lrc.h           |  24 ++
> >  drivers/gpu/drm/i915/gt/selftest_lrc.c        |  10 +-
> >  drivers/gpu/drm/i915/i915_drm_client.c        | 365 ++++++++++++++++++
> >  drivers/gpu/drm/i915/i915_drm_client.h        | 123 ++++++
> >  drivers/gpu/drm/i915/i915_drv.c               |   6 +
> >  drivers/gpu/drm/i915/i915_drv.h               |   5 +
> >  drivers/gpu/drm/i915/i915_gem.c               |  21 +-
> >  drivers/gpu/drm/i915/i915_gpu_error.c         |  31 +-
> >  drivers/gpu/drm/i915/i915_gpu_error.h         |   2 +-
> >  drivers/gpu/drm/i915/i915_sysfs.c             |   8 +
> >  19 files changed, 716 insertions(+), 81 deletions(-)
> >  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.c
> >  create mode 100644 drivers/gpu/drm/i915/i915_drm_client.h
> >
> > --
> > 2.30.2
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14 15:10                       ` [Intel-gfx] " Christian König
@ 2021-05-17 14:30                         ` Daniel Vetter
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-17 14:30 UTC (permalink / raw)
  To: Christian König
  Cc: Tvrtko Ursulin, Intel Graphics Development,
	Maling list - DRI developers, Nieto, David M

On Fri, May 14, 2021 at 05:10:29PM +0200, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> > 
> > On 14/05/2021 15:56, Christian König wrote:
> > > Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> > > > 
> > > > On 14/05/2021 14:53, Christian König wrote:
> > > > > > 
> > > > > > David also said that you considered sysfs but were wary
> > > > > > of exposing process info in there. To clarify, my patch
> > > > > > is not exposing sysfs entry per process, but one per
> > > > > > open drm fd.
> > > > > > 
> > > > > 
> > > > > Yes, we discussed this as well, but then rejected the approach.
> > > > > 
> > > > > To have useful information related to the open drm fd you
> > > > > need to related that to process(es) which have that file
> > > > > descriptor open. Just tracking who opened it first like DRM
> > > > > does is pretty useless on modern systems.
> > > > 
> > > > We do update the pid/name for fds passed over unix sockets.
> > > 
> > > Well I just double checked and that is not correct.
> > > 
> > > Could be that i915 has some special code for that, but on my laptop
> > > I only see the X server under the "clients" debugfs file.
> > 
> > Yes we have special code in i915 for this. Part of this series we are
> > discussing here.
> 
> Ah, yeah you should mention that. Could we please separate that into common
> code instead? Cause I really see that as a bug in the current handling
> independent of the discussion here.
> 
> As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).

> > > > > But an "lsof /dev/dri/renderD128" for example does exactly
> > > > > what top does as well, it iterates over /proc and sees which
> > > > > process has that file open.
> > > > 
> > > > Lsof is quite inefficient for this use case. It has to open
> > > > _all_ open files for _all_ processes on the system to find a
> > > > handful of ones which may have the DRM device open.
> > > 
> > > Completely agree.
> > > 
> > > The key point is you either need to have all references to an open
> > > fd, or at least track whoever last used that fd.
> > > 
> > > At least the last time I looked even the fs layer didn't know which
> > > fd is open by which process. So there wasn't really any alternative
> > > to the lsof approach.
> > 
> > I asked you about the use case you have in mind which you did not
> > answer. Otherwise I don't understand when do you need to walk all files.
> > What information you want to get?
> 
> Per fd debugging information, e.g. instead of the top use case you know
> which process you want to look at.
> 
> > 
> > For the use case of knowing which DRM file is using how much GPU time on
> > engine X we do not need to walk all open files either with my sysfs
> > approach or the proc approach from Chris. (In the former case we
> > optionally aggregate by PID at presentation time, and in the latter case
> > aggregation is implicit.)
> 
> I'm unsure if we should go with the sysfs, proc or some completely different
> approach.
> 
> In general it would be nice to have a way to find all the fd references for
> an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.

Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 14:30                         ` Daniel Vetter
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-17 14:30 UTC (permalink / raw)
  To: Christian König
  Cc: Intel Graphics Development, Maling list - DRI developers,
	Alex Deucher, Nieto, David M

On Fri, May 14, 2021 at 05:10:29PM +0200, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> > 
> > On 14/05/2021 15:56, Christian König wrote:
> > > Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> > > > 
> > > > On 14/05/2021 14:53, Christian König wrote:
> > > > > > 
> > > > > > David also said that you considered sysfs but were wary
> > > > > > of exposing process info in there. To clarify, my patch
> > > > > > is not exposing sysfs entry per process, but one per
> > > > > > open drm fd.
> > > > > > 
> > > > > 
> > > > > Yes, we discussed this as well, but then rejected the approach.
> > > > > 
> > > > > To have useful information related to the open drm fd you
> > > > > need to related that to process(es) which have that file
> > > > > descriptor open. Just tracking who opened it first like DRM
> > > > > does is pretty useless on modern systems.
> > > > 
> > > > We do update the pid/name for fds passed over unix sockets.
> > > 
> > > Well I just double checked and that is not correct.
> > > 
> > > Could be that i915 has some special code for that, but on my laptop
> > > I only see the X server under the "clients" debugfs file.
> > 
> > Yes we have special code in i915 for this. Part of this series we are
> > discussing here.
> 
> Ah, yeah you should mention that. Could we please separate that into common
> code instead? Cause I really see that as a bug in the current handling
> independent of the discussion here.
> 
> As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).

> > > > > But an "lsof /dev/dri/renderD128" for example does exactly
> > > > > what top does as well, it iterates over /proc and sees which
> > > > > process has that file open.
> > > > 
> > > > Lsof is quite inefficient for this use case. It has to open
> > > > _all_ open files for _all_ processes on the system to find a
> > > > handful of ones which may have the DRM device open.
> > > 
> > > Completely agree.
> > > 
> > > The key point is you either need to have all references to an open
> > > fd, or at least track whoever last used that fd.
> > > 
> > > At least the last time I looked even the fs layer didn't know which
> > > fd is open by which process. So there wasn't really any alternative
> > > to the lsof approach.
> > 
> > I asked you about the use case you have in mind which you did not
> > answer. Otherwise I don't understand when do you need to walk all files.
> > What information you want to get?
> 
> Per fd debugging information, e.g. instead of the top use case you know
> which process you want to look at.
> 
> > 
> > For the use case of knowing which DRM file is using how much GPU time on
> > engine X we do not need to walk all open files either with my sysfs
> > approach or the proc approach from Chris. (In the former case we
> > optionally aggregate by PID at presentation time, and in the latter case
> > aggregation is implicit.)
> 
> I'm unsure if we should go with the sysfs, proc or some completely different
> approach.
> 
> In general it would be nice to have a way to find all the fd references for
> an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.

Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.

Cheers, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-17 14:30                         ` [Intel-gfx] " Daniel Vetter
@ 2021-05-17 14:39                           ` Nieto, David M
  -1 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-17 14:39 UTC (permalink / raw)
  To: Daniel Vetter, Koenig, Christian
  Cc: Tvrtko Ursulin, Intel Graphics Development, Maling list - DRI developers

[-- Attachment #1: Type: text/plain, Size: 5264 bytes --]

[AMD Official Use Only]

Maybe we could try to standardize how the different submission ring  usage gets exposed in the fdinfo? We went the simple way of just adding name and index, but if someone has a suggestion on how else we could format them so there is commonality across vendors we could just amend those.

I’d really like to have the process managers tools display GPU usage regardless of what vendor is installed.

________________________________
From: Daniel Vetter <daniel@ffwll.ch>
Sent: Monday, May 17, 2021 7:30:47 AM
To: Koenig, Christian <Christian.Koenig@amd.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M <David.Nieto@amd.com>; Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 0/7] Per client engine busyness

On Fri, May 14, 2021 at 05:10:29PM +0200, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> >
> > On 14/05/2021 15:56, Christian König wrote:
> > > Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> > > >
> > > > On 14/05/2021 14:53, Christian König wrote:
> > > > > >
> > > > > > David also said that you considered sysfs but were wary
> > > > > > of exposing process info in there. To clarify, my patch
> > > > > > is not exposing sysfs entry per process, but one per
> > > > > > open drm fd.
> > > > > >
> > > > >
> > > > > Yes, we discussed this as well, but then rejected the approach.
> > > > >
> > > > > To have useful information related to the open drm fd you
> > > > > need to related that to process(es) which have that file
> > > > > descriptor open. Just tracking who opened it first like DRM
> > > > > does is pretty useless on modern systems.
> > > >
> > > > We do update the pid/name for fds passed over unix sockets.
> > >
> > > Well I just double checked and that is not correct.
> > >
> > > Could be that i915 has some special code for that, but on my laptop
> > > I only see the X server under the "clients" debugfs file.
> >
> > Yes we have special code in i915 for this. Part of this series we are
> > discussing here.
>
> Ah, yeah you should mention that. Could we please separate that into common
> code instead? Cause I really see that as a bug in the current handling
> independent of the discussion here.
>
> As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).

> > > > > But an "lsof /dev/dri/renderD128" for example does exactly
> > > > > what top does as well, it iterates over /proc and sees which
> > > > > process has that file open.
> > > >
> > > > Lsof is quite inefficient for this use case. It has to open
> > > > _all_ open files for _all_ processes on the system to find a
> > > > handful of ones which may have the DRM device open.
> > >
> > > Completely agree.
> > >
> > > The key point is you either need to have all references to an open
> > > fd, or at least track whoever last used that fd.
> > >
> > > At least the last time I looked even the fs layer didn't know which
> > > fd is open by which process. So there wasn't really any alternative
> > > to the lsof approach.
> >
> > I asked you about the use case you have in mind which you did not
> > answer. Otherwise I don't understand when do you need to walk all files.
> > What information you want to get?
>
> Per fd debugging information, e.g. instead of the top use case you know
> which process you want to look at.
>
> >
> > For the use case of knowing which DRM file is using how much GPU time on
> > engine X we do not need to walk all open files either with my sysfs
> > approach or the proc approach from Chris. (In the former case we
> > optionally aggregate by PID at presentation time, and in the latter case
> > aggregation is implicit.)
>
> I'm unsure if we should go with the sysfs, proc or some completely different
> approach.
>
> In general it would be nice to have a way to find all the fd references for
> an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.

Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C3711fdd207484d6bb5fd08d919405eb0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637568586536251118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=so13elRn0Ffl7w51QEk%2F%2FXmxOav9n5p6fNXrnDBVY%2B0%3D&amp;reserved=0

[-- Attachment #2: Type: text/html, Size: 7555 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 14:39                           ` Nieto, David M
  0 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-17 14:39 UTC (permalink / raw)
  To: Daniel Vetter, Koenig, Christian
  Cc: Alex Deucher, Intel Graphics Development, Maling list - DRI developers


[-- Attachment #1.1: Type: text/plain, Size: 5264 bytes --]

[AMD Official Use Only]

Maybe we could try to standardize how the different submission ring  usage gets exposed in the fdinfo? We went the simple way of just adding name and index, but if someone has a suggestion on how else we could format them so there is commonality across vendors we could just amend those.

I’d really like to have the process managers tools display GPU usage regardless of what vendor is installed.

________________________________
From: Daniel Vetter <daniel@ffwll.ch>
Sent: Monday, May 17, 2021 7:30:47 AM
To: Koenig, Christian <Christian.Koenig@amd.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Nieto, David M <David.Nieto@amd.com>; Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Daniel Vetter <daniel@ffwll.ch>
Subject: Re: [PATCH 0/7] Per client engine busyness

On Fri, May 14, 2021 at 05:10:29PM +0200, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> >
> > On 14/05/2021 15:56, Christian König wrote:
> > > Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> > > >
> > > > On 14/05/2021 14:53, Christian König wrote:
> > > > > >
> > > > > > David also said that you considered sysfs but were wary
> > > > > > of exposing process info in there. To clarify, my patch
> > > > > > is not exposing sysfs entry per process, but one per
> > > > > > open drm fd.
> > > > > >
> > > > >
> > > > > Yes, we discussed this as well, but then rejected the approach.
> > > > >
> > > > > To have useful information related to the open drm fd you
> > > > > need to related that to process(es) which have that file
> > > > > descriptor open. Just tracking who opened it first like DRM
> > > > > does is pretty useless on modern systems.
> > > >
> > > > We do update the pid/name for fds passed over unix sockets.
> > >
> > > Well I just double checked and that is not correct.
> > >
> > > Could be that i915 has some special code for that, but on my laptop
> > > I only see the X server under the "clients" debugfs file.
> >
> > Yes we have special code in i915 for this. Part of this series we are
> > discussing here.
>
> Ah, yeah you should mention that. Could we please separate that into common
> code instead? Cause I really see that as a bug in the current handling
> independent of the discussion here.
>
> As far as I know all IOCTLs go though some common place in DRM anyway.

Yeah, might be good to fix that confusion in debugfs. But since that's
non-uapi, I guess no one ever cared (enough).

> > > > > But an "lsof /dev/dri/renderD128" for example does exactly
> > > > > what top does as well, it iterates over /proc and sees which
> > > > > process has that file open.
> > > >
> > > > Lsof is quite inefficient for this use case. It has to open
> > > > _all_ open files for _all_ processes on the system to find a
> > > > handful of ones which may have the DRM device open.
> > >
> > > Completely agree.
> > >
> > > The key point is you either need to have all references to an open
> > > fd, or at least track whoever last used that fd.
> > >
> > > At least the last time I looked even the fs layer didn't know which
> > > fd is open by which process. So there wasn't really any alternative
> > > to the lsof approach.
> >
> > I asked you about the use case you have in mind which you did not
> > answer. Otherwise I don't understand when do you need to walk all files.
> > What information you want to get?
>
> Per fd debugging information, e.g. instead of the top use case you know
> which process you want to look at.
>
> >
> > For the use case of knowing which DRM file is using how much GPU time on
> > engine X we do not need to walk all open files either with my sysfs
> > approach or the proc approach from Chris. (In the former case we
> > optionally aggregate by PID at presentation time, and in the latter case
> > aggregation is implicit.)
>
> I'm unsure if we should go with the sysfs, proc or some completely different
> approach.
>
> In general it would be nice to have a way to find all the fd references for
> an open inode.

Yeah, but that maybe needs to be an ioctl or syscall or something on the
inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
open files? If this really is a real world problem, but given that
top/lsof and everyone else hasn't asked for it yet maybe it's not.

Also I replied in some other thread, I really like the fdinfo stuff, and I
think trying to somewhat standardized this across drivers would be neat.
Especially since i915 is going to adopt drm/scheduler for front-end
scheduling too, so at least some of this should be fairly easy to share.

Cheers, Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7C3711fdd207484d6bb5fd08d919405eb0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637568586536251118%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=so13elRn0Ffl7w51QEk%2F%2FXmxOav9n5p6fNXrnDBVY%2B0%3D&amp;reserved=0

[-- Attachment #1.2: Type: text/html, Size: 7555 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-17 14:39                           ` [Intel-gfx] " Nieto, David M
@ 2021-05-17 16:00                             ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-17 16:00 UTC (permalink / raw)
  To: Nieto, David M, Daniel Vetter, Koenig, Christian
  Cc: Intel Graphics Development, Maling list - DRI developers


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> 
> Maybe we could try to standardize how the different submission ring 
>   usage gets exposed in the fdinfo? We went the simple way of just 
> adding name and index, but if someone has a suggestion on how else we 
> could format them so there is commonality across vendors we could just 
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also 
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an 
extra cost in DRM client discovery (compared to my sysfs series and also 
procfs RFC from Chris). It would require reading all processes (well 
threads, then maybe aggregating threads into parent processes), all fd 
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage 
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 16:00                             ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-17 16:00 UTC (permalink / raw)
  To: Nieto, David M, Daniel Vetter, Koenig, Christian
  Cc: Alex Deucher, Intel Graphics Development, Maling list - DRI developers


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> 
> Maybe we could try to standardize how the different submission ring 
>   usage gets exposed in the fdinfo? We went the simple way of just 
> adding name and index, but if someone has a suggestion on how else we 
> could format them so there is commonality across vendors we could just 
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also 
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an 
extra cost in DRM client discovery (compared to my sysfs series and also 
procfs RFC from Chris). It would require reading all processes (well 
threads, then maybe aggregating threads into parent processes), all fd 
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage 
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-15 10:40                     ` Maxime Schmitt
@ 2021-05-17 16:13                       ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-17 16:13 UTC (permalink / raw)
  To: Maxime Schmitt, intel-gfx


Hi,

On 15/05/2021 11:40, Maxime Schmitt wrote:
> Hi,
> 
> Nice to see something like this being worked on.
> 
> I wrote a top-like tool some time back (nvtop).
> I targeted NVIDIA, because it was the GPU I had at the time. Also,
> their driver provides a nice library to retrieve the information from
> (NVML).
> 
> Seeing this thread I think it would be nice to support more vendors now
> that the information is available.
> 
> I took a look at the DRM documentation, but I am only finding the in-
> kernel functions and not what is being exposed to user space. Maybe I
> am searching at the wrong place.
> Is there some documentation, from the user space point of view, on the
> way to discover the GPUs and the metrics that are exposed?
There isn't a common framework yet, its under discussion in this thread.

AMD has some stuff under /proc/<pid>/fdinfo/<fd>, and i915 at the moment 
has only global GPU stats (export as a perf/PMU device, see "perf list | 
grep i915.*/") used by current version of intel_gpu_top.

Regards,

Tvrtko

P.S. I suggest you use reply-to-all when replying on mailing list 
threads so it's easier to spot your message.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-17 16:00                             ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-17 18:02                               ` Nieto, David M
  -1 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-17 18:02 UTC (permalink / raw)
  To: Tvrtko Ursulin, Daniel Vetter, Koenig, Christian
  Cc: Intel Graphics Development, Maling list - DRI developers

[-- Attachment #1: Type: text/plain, Size: 2120 bytes --]

[AMD Official Use Only]

The format is simple:

<ringname><index>: <XXX.XX> %

we also have entries for the memory mapped:
mem <ttm pool> : <size> KiB

On my submission https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a python script to print out the info. It has a CPU usage lower that top, for example.

To be absolutely honest, I agree that there is an overhead, but It might not be as much as you fear.
________________________________
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M <David.Nieto@amd.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko

[-- Attachment #2: Type: text/html, Size: 4905 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 18:02                               ` Nieto, David M
  0 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-17 18:02 UTC (permalink / raw)
  To: Tvrtko Ursulin, Daniel Vetter, Koenig, Christian
  Cc: Alex Deucher, Intel Graphics Development, Maling list - DRI developers


[-- Attachment #1.1: Type: text/plain, Size: 2120 bytes --]

[AMD Official Use Only]

The format is simple:

<ringname><index>: <XXX.XX> %

we also have entries for the memory mapped:
mem <ttm pool> : <size> KiB

On my submission https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a python script to print out the info. It has a CPU usage lower that top, for example.

To be absolutely honest, I agree that there is an overhead, but It might not be as much as you fear.
________________________________
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M <David.Nieto@amd.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko

[-- Attachment #1.2: Type: text/html, Size: 4905 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [PATCH 0/7] Per client engine busyness
  2021-05-17 18:02                               ` [Intel-gfx] " Nieto, David M
  (?)
@ 2021-05-17 18:16                                 ` Nieto, David M
  -1 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-17 18:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, Daniel Vetter, Koenig, Christian, jhubbard, aritger
  Cc: Alex Deucher, nouveau, Intel Graphics Development,
	Maling list - DRI developers


[-- Attachment #1.1: Type: text/plain, Size: 3223 bytes --]

[Public]

Cycling some of the Nvidia/nouveau guys here too.

I think there is a benefit on trying to estandarize how fdinfo can be used to expose per engine and device memory utilization.

Another of the advantages of going the /proc/ way instead of the sysfs debugfs approach is that you inherit the access lists directly from the distribution and you don't need to start messing with ownership and group access. By default an user can monitor its own processes as long as /proc is mounted.

I am not saying that fdinfo or the way we implemented is 100% the way to go, but I'd rather have a solution within the confines of proc first.

David



________________________________
From: Nieto, David M <David.Nieto@amd.com>
Sent: Monday, May 17, 2021 11:02 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness

The format is simple:

<ringname><index>: <XXX.XX> %

we also have entries for the memory mapped:
mem <ttm pool> : <size> KiB

On my submission https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a python script to print out the info. It has a CPU usage lower that top, for example.

To be absolutely honest, I agree that there is an overhead, but It might not be as much as you fear.
________________________________
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M <David.Nieto@amd.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko

[-- Attachment #1.2: Type: text/html, Size: 8121 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
@ 2021-05-17 18:16                                 ` Nieto, David M
  0 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-17 18:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, Daniel Vetter, Koenig, Christian, jhubbard, aritger
  Cc: nouveau, Intel Graphics Development, Maling list - DRI developers

[-- Attachment #1: Type: text/plain, Size: 3223 bytes --]

[Public]

Cycling some of the Nvidia/nouveau guys here too.

I think there is a benefit on trying to estandarize how fdinfo can be used to expose per engine and device memory utilization.

Another of the advantages of going the /proc/ way instead of the sysfs debugfs approach is that you inherit the access lists directly from the distribution and you don't need to start messing with ownership and group access. By default an user can monitor its own processes as long as /proc is mounted.

I am not saying that fdinfo or the way we implemented is 100% the way to go, but I'd rather have a solution within the confines of proc first.

David



________________________________
From: Nieto, David M <David.Nieto@amd.com>
Sent: Monday, May 17, 2021 11:02 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness

The format is simple:

<ringname><index>: <XXX.XX> %

we also have entries for the memory mapped:
mem <ttm pool> : <size> KiB

On my submission https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a python script to print out the info. It has a CPU usage lower that top, for example.

To be absolutely honest, I agree that there is an overhead, but It might not be as much as you fear.
________________________________
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M <David.Nieto@amd.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko

[-- Attachment #2: Type: text/html, Size: 8121 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 18:16                                 ` Nieto, David M
  0 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-17 18:16 UTC (permalink / raw)
  To: Tvrtko Ursulin, Daniel Vetter, Koenig, Christian, jhubbard, aritger
  Cc: Alex Deucher, nouveau, Intel Graphics Development,
	Maling list - DRI developers


[-- Attachment #1.1: Type: text/plain, Size: 3223 bytes --]

[Public]

Cycling some of the Nvidia/nouveau guys here too.

I think there is a benefit on trying to estandarize how fdinfo can be used to expose per engine and device memory utilization.

Another of the advantages of going the /proc/ way instead of the sysfs debugfs approach is that you inherit the access lists directly from the distribution and you don't need to start messing with ownership and group access. By default an user can monitor its own processes as long as /proc is mounted.

I am not saying that fdinfo or the way we implemented is 100% the way to go, but I'd rather have a solution within the confines of proc first.

David



________________________________
From: Nieto, David M <David.Nieto@amd.com>
Sent: Monday, May 17, 2021 11:02 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness

The format is simple:

<ringname><index>: <XXX.XX> %

we also have entries for the memory mapped:
mem <ttm pool> : <size> KiB

On my submission https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html I added a python script to print out the info. It has a CPU usage lower that top, for example.

To be absolutely honest, I agree that there is an overhead, but It might not be as much as you fear.
________________________________
From: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Sent: Monday, May 17, 2021 9:00 AM
To: Nieto, David M <David.Nieto@amd.com>; Daniel Vetter <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
Cc: Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>
Subject: Re: [PATCH 0/7] Per client engine busyness


On 17/05/2021 15:39, Nieto, David M wrote:
> [AMD Official Use Only]
>
>
> Maybe we could try to standardize how the different submission ring
>   usage gets exposed in the fdinfo? We went the simple way of just
> adding name and index, but if someone has a suggestion on how else we
> could format them so there is commonality across vendors we could just
> amend those.

Could you paste an example of your format?

Standardized fdinfo sounds good to me in principle. But I would also
like people to look at the procfs proposal from Chris,
  - link to which I have pasted elsewhere in the thread.

Only potential issue with fdinfo I see at the moment is a bit of an
extra cost in DRM client discovery (compared to my sysfs series and also
procfs RFC from Chris). It would require reading all processes (well
threads, then maybe aggregating threads into parent processes), all fd
symlinks, and doing a stat on them to figure out which ones are DRM devices.

Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

> I’d really like to have the process managers tools display GPU usage
> regardless of what vendor is installed.

Definitely.

Regards,

Tvrtko

[-- Attachment #1.2: Type: text/html, Size: 8121 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [PATCH 0/7] Per client engine busyness
  2021-05-17 18:16                                 ` Nieto, David M
  (?)
@ 2021-05-17 19:03                                   ` Simon Ser
  -1 siblings, 0 replies; 103+ messages in thread
From: Simon Ser @ 2021-05-17 19:03 UTC (permalink / raw)
  To: Nieto, David M
  Cc: Tvrtko Ursulin, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Daniel Vetter, Koenig, Christian

On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M <David.Nieto@amd.com> wrote:

> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

It's not in the headers, but it's de facto uAPI, as seen in libdrm:

    > git grep 226
    xf86drm.c
    99:#define DRM_MAJOR 226 /* Linux */
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
@ 2021-05-17 19:03                                   ` Simon Ser
  0 siblings, 0 replies; 103+ messages in thread
From: Simon Ser @ 2021-05-17 19:03 UTC (permalink / raw)
  To: Nieto, David M
  Cc: Tvrtko Ursulin, nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, Koenig, Christian,
	aritger

On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M <David.Nieto@amd.com> wrote:

> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

It's not in the headers, but it's de facto uAPI, as seen in libdrm:

    > git grep 226
    xf86drm.c
    99:#define DRM_MAJOR 226 /* Linux */

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 19:03                                   ` Simon Ser
  0 siblings, 0 replies; 103+ messages in thread
From: Simon Ser @ 2021-05-17 19:03 UTC (permalink / raw)
  To: Nieto, David M
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, Koenig, Christian,
	aritger

On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M <David.Nieto@amd.com> wrote:

> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.

It's not in the headers, but it's de facto uAPI, as seen in libdrm:

    > git grep 226
    xf86drm.c
    99:#define DRM_MAJOR 226 /* Linux */
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-17 14:30                         ` [Intel-gfx] " Daniel Vetter
@ 2021-05-17 19:16                           ` Christian König
  -1 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-17 19:16 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Tvrtko Ursulin, Intel Graphics Development,
	Maling list - DRI developers, Nieto, David M

Am 17.05.21 um 16:30 schrieb Daniel Vetter:
> [SNIP]
>>>> Could be that i915 has some special code for that, but on my laptop
>>>> I only see the X server under the "clients" debugfs file.
>>> Yes we have special code in i915 for this. Part of this series we are
>>> discussing here.
>> Ah, yeah you should mention that. Could we please separate that into common
>> code instead? Cause I really see that as a bug in the current handling
>> independent of the discussion here.
>>
>> As far as I know all IOCTLs go though some common place in DRM anyway.
> Yeah, might be good to fix that confusion in debugfs. But since that's
> non-uapi, I guess no one ever cared (enough).

Well we cared, problem is that we didn't know how to fix it properly and 
pretty much duplicated it in the VM code :)

>>> For the use case of knowing which DRM file is using how much GPU time on
>>> engine X we do not need to walk all open files either with my sysfs
>>> approach or the proc approach from Chris. (In the former case we
>>> optionally aggregate by PID at presentation time, and in the latter case
>>> aggregation is implicit.)
>> I'm unsure if we should go with the sysfs, proc or some completely different
>> approach.
>>
>> In general it would be nice to have a way to find all the fd references for
>> an open inode.
> Yeah, but that maybe needs to be an ioctl or syscall or something on the
> inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
> open files? If this really is a real world problem, but given that
> top/lsof and everyone else hasn't asked for it yet maybe it's not.

Well has anybody already measured how much overhead it would be to 
iterate over the relevant data structures in the kernel instead of 
userspace?

I mean we don't really need the tracking when a couple of hundred fd 
tables can be processed in just a few ms because of lockless RCU protection.

> Also I replied in some other thread, I really like the fdinfo stuff, and I
> think trying to somewhat standardized this across drivers would be neat.
> Especially since i915 is going to adopt drm/scheduler for front-end
> scheduling too, so at least some of this should be fairly easy to share.

Yeah, that sounds like a good idea to me as well.

Regards,
Christian.

>
> Cheers, Daniel


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-17 19:16                           ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-17 19:16 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Alex Deucher, Intel Graphics Development,
	Maling list - DRI developers, Nieto, David M

Am 17.05.21 um 16:30 schrieb Daniel Vetter:
> [SNIP]
>>>> Could be that i915 has some special code for that, but on my laptop
>>>> I only see the X server under the "clients" debugfs file.
>>> Yes we have special code in i915 for this. Part of this series we are
>>> discussing here.
>> Ah, yeah you should mention that. Could we please separate that into common
>> code instead? Cause I really see that as a bug in the current handling
>> independent of the discussion here.
>>
>> As far as I know all IOCTLs go though some common place in DRM anyway.
> Yeah, might be good to fix that confusion in debugfs. But since that's
> non-uapi, I guess no one ever cared (enough).

Well we cared, problem is that we didn't know how to fix it properly and 
pretty much duplicated it in the VM code :)

>>> For the use case of knowing which DRM file is using how much GPU time on
>>> engine X we do not need to walk all open files either with my sysfs
>>> approach or the proc approach from Chris. (In the former case we
>>> optionally aggregate by PID at presentation time, and in the latter case
>>> aggregation is implicit.)
>> I'm unsure if we should go with the sysfs, proc or some completely different
>> approach.
>>
>> In general it would be nice to have a way to find all the fd references for
>> an open inode.
> Yeah, but that maybe needs to be an ioctl or syscall or something on the
> inode, that givey you a list of (procfd, fd_nr) pairs pointing back at all
> open files? If this really is a real world problem, but given that
> top/lsof and everyone else hasn't asked for it yet maybe it's not.

Well has anybody already measured how much overhead it would be to 
iterate over the relevant data structures in the kernel instead of 
userspace?

I mean we don't really need the tracking when a couple of hundred fd 
tables can be processed in just a few ms because of lockless RCU protection.

> Also I replied in some other thread, I really like the fdinfo stuff, and I
> think trying to somewhat standardized this across drivers would be neat.
> Especially since i915 is going to adopt drm/scheduler for front-end
> scheduling too, so at least some of this should be fairly easy to share.

Yeah, that sounds like a good idea to me as well.

Regards,
Christian.

>
> Cheers, Daniel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [PATCH 0/7] Per client engine busyness
  2021-05-17 19:03                                   ` Simon Ser
  (?)
@ 2021-05-18  9:08                                     ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:08 UTC (permalink / raw)
  To: Simon Ser, Nieto, David M
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, Daniel Vetter, Koenig, Christian


On 17/05/2021 20:03, Simon Ser wrote:
> On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M <David.Nieto@amd.com> wrote:
> 
>> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.
> 
> It's not in the headers, but it's de facto uAPI, as seen in libdrm:
> 
>      > git grep 226
>      xf86drm.c
>      99:#define DRM_MAJOR 226 /* Linux */

I suspected it would be yes, thanks.

I was just wondering if stat(2) and a chrdev major check would be a 
solid criteria to more efficiently (compared to parsing the text 
content) detect drm files while walking procfs.

Regards,

Tvrtko
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
@ 2021-05-18  9:08                                     ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:08 UTC (permalink / raw)
  To: Simon Ser, Nieto, David M
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, Koenig, Christian,
	aritger


On 17/05/2021 20:03, Simon Ser wrote:
> On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M <David.Nieto@amd.com> wrote:
> 
>> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.
> 
> It's not in the headers, but it's de facto uAPI, as seen in libdrm:
> 
>      > git grep 226
>      xf86drm.c
>      99:#define DRM_MAJOR 226 /* Linux */

I suspected it would be yes, thanks.

I was just wondering if stat(2) and a chrdev major check would be a 
solid criteria to more efficiently (compared to parsing the text 
content) detect drm files while walking procfs.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-18  9:08                                     ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:08 UTC (permalink / raw)
  To: Simon Ser, Nieto, David M
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, Koenig, Christian,
	aritger


On 17/05/2021 20:03, Simon Ser wrote:
> On Monday, May 17th, 2021 at 8:16 PM, Nieto, David M <David.Nieto@amd.com> wrote:
> 
>> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.
> 
> It's not in the headers, but it's de facto uAPI, as seen in libdrm:
> 
>      > git grep 226
>      xf86drm.c
>      99:#define DRM_MAJOR 226 /* Linux */

I suspected it would be yes, thanks.

I was just wondering if stat(2) and a chrdev major check would be a 
solid criteria to more efficiently (compared to parsing the text 
content) detect drm files while walking procfs.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [PATCH 0/7] Per client engine busyness
  2021-05-18  9:08                                     ` Tvrtko Ursulin
  (?)
@ 2021-05-18  9:16                                       ` Daniel Stone
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Stone @ 2021-05-18  9:16 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, Koenig, Christian,
	Nieto, David M

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> I was just wondering if stat(2) and a chrdev major check would be a
> solid criteria to more efficiently (compared to parsing the text
> content) detect drm files while walking procfs.

Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?

Cheers,
Daniel
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
@ 2021-05-18  9:16                                       ` Daniel Stone
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Stone @ 2021-05-18  9:16 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: jhubbard, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Koenig, Christian, aritger, Nieto,
	David M

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> I was just wondering if stat(2) and a chrdev major check would be a
> solid criteria to more efficiently (compared to parsing the text
> content) detect drm files while walking procfs.

Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?

Cheers,
Daniel

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-18  9:16                                       ` Daniel Stone
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Stone @ 2021-05-18  9:16 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: jhubbard, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, Koenig, Christian,
	aritger, Nieto, David M

Hi,

On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
> I was just wondering if stat(2) and a chrdev major check would be a
> solid criteria to more efficiently (compared to parsing the text
> content) detect drm files while walking procfs.

Maybe I'm missing something, but is the per-PID walk actually a
measurable performance issue rather than just a bit unpleasant?

Cheers,
Daniel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-17 18:02                               ` [Intel-gfx] " Nieto, David M
@ 2021-05-18  9:35                                 ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:35 UTC (permalink / raw)
  To: Nieto, David M, Daniel Vetter, Koenig, Christian
  Cc: Intel Graphics Development, Maling list - DRI developers


On 17/05/2021 19:02, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> 
> The format is simple:
> 
> <ringname><index>: <XXX.XX> %

Hm what time period does the percent relate to?

The i915 implementation uses accumulated nanoseconds active. That way 
who reads the file can calculate the percentage relative to the time 
period between two reads of the file.

> we also have entries for the memory mapped:
> mem <ttm pool> : <size> KiB

Okay so in general key values per line in text format. Colon as delimiter.

What common fields could be useful between different drivers and what 
common naming scheme, in order to enable as easy as possible creation of 
a generic top-like tool?

driver: <ko name>
pdev: <pci slot>
ring-<name>: N <unit>
...
mem-<name>: N <unit>
...

What else?
Is ring a good common name? We actually more use engine in i915 but I am 
not really bothered about it.

Aggregated GPU usage could be easily and generically done by userspace 
by adding all rings and normalizing.

> On my submission 
> https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html 
> <https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html> I 
> added a python script to print out the info. It has a CPU usage lower 
> that top, for example.
> 
> To be absolutely honest, I agree that there is an overhead, but It might 
> not be as much as you fear.

For me more the issue is that the extra number of operations grows with 
the number of open files on the system, which has no relation to the 
number of drm clients.

Extra so if the monitoring tool wants to show _only_ DRM processes. Then 
the cost scales with total number of processes time total number of 
files on the server.

This design inefficiency bothers me yes. This is somewhat alleviated by 
the proposal from Chris 
(https://patchwork.freedesktop.org/patch/419042/?series=86692&rev=1) 
although there are downsides there as well. Like needing to keep a map 
of pids to drm files in drivers.

Btw what do you do in that tool for same fd in a multi-threaded process
or so? Do you show duplicate entries or detect and ignore? I guess I did 
not figure out if you show by pid/tgid or by fd.

Regards,

Tvrtko

> ------------------------------------------------------------------------
> *From:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> *Sent:* Monday, May 17, 2021 9:00 AM
> *To:* Nieto, David M <David.Nieto@amd.com>; Daniel Vetter 
> <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
> *Cc:* Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development 
> <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers 
> <dri-devel@lists.freedesktop.org>
> *Subject:* Re: [PATCH 0/7] Per client engine busyness
> 
> On 17/05/2021 15:39, Nieto, David M wrote:
>> [AMD Official Use Only]
>> 
>> 
>> Maybe we could try to standardize how the different submission ring 
>>   usage gets exposed in the fdinfo? We went the simple way of just 
>> adding name and index, but if someone has a suggestion on how else we 
>> could format them so there is commonality across vendors we could just 
>> amend those.
> 
> Could you paste an example of your format?
> 
> Standardized fdinfo sounds good to me in principle. But I would also
> like people to look at the procfs proposal from Chris,
>    - link to which I have pasted elsewhere in the thread.
> 
> Only potential issue with fdinfo I see at the moment is a bit of an
> extra cost in DRM client discovery (compared to my sysfs series and also
> procfs RFC from Chris). It would require reading all processes (well
> threads, then maybe aggregating threads into parent processes), all fd
> symlinks, and doing a stat on them to figure out which ones are DRM devices.
> 
> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.
> 
>> I’d really like to have the process managers tools display GPU usage 
>> regardless of what vendor is installed.
> 
> Definitely.
> 
> Regards,
> 
> Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-18  9:35                                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:35 UTC (permalink / raw)
  To: Nieto, David M, Daniel Vetter, Koenig, Christian
  Cc: Alex Deucher, Intel Graphics Development, Maling list - DRI developers


On 17/05/2021 19:02, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> 
> The format is simple:
> 
> <ringname><index>: <XXX.XX> %

Hm what time period does the percent relate to?

The i915 implementation uses accumulated nanoseconds active. That way 
who reads the file can calculate the percentage relative to the time 
period between two reads of the file.

> we also have entries for the memory mapped:
> mem <ttm pool> : <size> KiB

Okay so in general key values per line in text format. Colon as delimiter.

What common fields could be useful between different drivers and what 
common naming scheme, in order to enable as easy as possible creation of 
a generic top-like tool?

driver: <ko name>
pdev: <pci slot>
ring-<name>: N <unit>
...
mem-<name>: N <unit>
...

What else?
Is ring a good common name? We actually more use engine in i915 but I am 
not really bothered about it.

Aggregated GPU usage could be easily and generically done by userspace 
by adding all rings and normalizing.

> On my submission 
> https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html 
> <https://lists.freedesktop.org/archives/amd-gfx/2021-May/063149.html> I 
> added a python script to print out the info. It has a CPU usage lower 
> that top, for example.
> 
> To be absolutely honest, I agree that there is an overhead, but It might 
> not be as much as you fear.

For me more the issue is that the extra number of operations grows with 
the number of open files on the system, which has no relation to the 
number of drm clients.

Extra so if the monitoring tool wants to show _only_ DRM processes. Then 
the cost scales with total number of processes time total number of 
files on the server.

This design inefficiency bothers me yes. This is somewhat alleviated by 
the proposal from Chris 
(https://patchwork.freedesktop.org/patch/419042/?series=86692&rev=1) 
although there are downsides there as well. Like needing to keep a map 
of pids to drm files in drivers.

Btw what do you do in that tool for same fd in a multi-threaded process
or so? Do you show duplicate entries or detect and ignore? I guess I did 
not figure out if you show by pid/tgid or by fd.

Regards,

Tvrtko

> ------------------------------------------------------------------------
> *From:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> *Sent:* Monday, May 17, 2021 9:00 AM
> *To:* Nieto, David M <David.Nieto@amd.com>; Daniel Vetter 
> <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
> *Cc:* Alex Deucher <alexdeucher@gmail.com>; Intel Graphics Development 
> <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers 
> <dri-devel@lists.freedesktop.org>
> *Subject:* Re: [PATCH 0/7] Per client engine busyness
> 
> On 17/05/2021 15:39, Nieto, David M wrote:
>> [AMD Official Use Only]
>> 
>> 
>> Maybe we could try to standardize how the different submission ring 
>>   usage gets exposed in the fdinfo? We went the simple way of just 
>> adding name and index, but if someone has a suggestion on how else we 
>> could format them so there is commonality across vendors we could just 
>> amend those.
> 
> Could you paste an example of your format?
> 
> Standardized fdinfo sounds good to me in principle. But I would also
> like people to look at the procfs proposal from Chris,
>    - link to which I have pasted elsewhere in the thread.
> 
> Only potential issue with fdinfo I see at the moment is a bit of an
> extra cost in DRM client discovery (compared to my sysfs series and also
> procfs RFC from Chris). It would require reading all processes (well
> threads, then maybe aggregating threads into parent processes), all fd
> symlinks, and doing a stat on them to figure out which ones are DRM devices.
> 
> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.
> 
>> I’d really like to have the process managers tools display GPU usage 
>> regardless of what vendor is installed.
> 
> Definitely.
> 
> Regards,
> 
> Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [PATCH 0/7] Per client engine busyness
  2021-05-18  9:16                                       ` Daniel Stone
  (?)
@ 2021-05-18  9:40                                         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:40 UTC (permalink / raw)
  To: Daniel Stone
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, Koenig, Christian,
	Nieto, David M


On 18/05/2021 10:16, Daniel Stone wrote:
> Hi,
> 
> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>> I was just wondering if stat(2) and a chrdev major check would be a
>> solid criteria to more efficiently (compared to parsing the text
>> content) detect drm files while walking procfs.
> 
> Maybe I'm missing something, but is the per-PID walk actually a
> measurable performance issue rather than just a bit unpleasant?

Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that 
the cost of obtaining GPU usage scales based on non-GPU criteria.

For use case of a top-like tool which shows all processes this is a 
smaller additional cost, but then for a gpu-top like tool it is somewhat 
higher.

Regards,

Tvrtko
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
@ 2021-05-18  9:40                                         ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:40 UTC (permalink / raw)
  To: Daniel Stone
  Cc: jhubbard, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Koenig, Christian, aritger, Nieto,
	David M


On 18/05/2021 10:16, Daniel Stone wrote:
> Hi,
> 
> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>> I was just wondering if stat(2) and a chrdev major check would be a
>> solid criteria to more efficiently (compared to parsing the text
>> content) detect drm files while walking procfs.
> 
> Maybe I'm missing something, but is the per-PID walk actually a
> measurable performance issue rather than just a bit unpleasant?

Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that 
the cost of obtaining GPU usage scales based on non-GPU criteria.

For use case of a top-like tool which shows all processes this is a 
smaller additional cost, but then for a gpu-top like tool it is somewhat 
higher.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-18  9:40                                         ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-18  9:40 UTC (permalink / raw)
  To: Daniel Stone
  Cc: jhubbard, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, Koenig, Christian,
	aritger, Nieto, David M


On 18/05/2021 10:16, Daniel Stone wrote:
> Hi,
> 
> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>> I was just wondering if stat(2) and a chrdev major check would be a
>> solid criteria to more efficiently (compared to parsing the text
>> content) detect drm files while walking procfs.
> 
> Maybe I'm missing something, but is the per-PID walk actually a
> measurable performance issue rather than just a bit unpleasant?

Per pid and per each open fd.

As said in the other thread what bothers me a bit in this scheme is that 
the cost of obtaining GPU usage scales based on non-GPU criteria.

For use case of a top-like tool which shows all processes this is a 
smaller additional cost, but then for a gpu-top like tool it is somewhat 
higher.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-18  9:35                                 ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-05-18 12:06                                   ` Christian König
  -1 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-18 12:06 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Daniel Vetter
  Cc: Intel Graphics Development, Maling list - DRI developers

Am 18.05.21 um 11:35 schrieb Tvrtko Ursulin:
>
> On 17/05/2021 19:02, Nieto, David M wrote:
>> [AMD Official Use Only]
>>
>>
>> The format is simple:
>>
>> <ringname><index>: <XXX.XX> %
>
> Hm what time period does the percent relate to?
>
> The i915 implementation uses accumulated nanoseconds active. That way 
> who reads the file can calculate the percentage relative to the time 
> period between two reads of the file.

That sounds much saner to me as well. The percentage calculation inside 
the kernel looks suspiciously misplaced.

>
>> we also have entries for the memory mapped:
>> mem <ttm pool> : <size> KiB
>
> Okay so in general key values per line in text format. Colon as 
> delimiter.
>
> What common fields could be useful between different drivers and what 
> common naming scheme, in order to enable as easy as possible creation 
> of a generic top-like tool?
>
> driver: <ko name>
> pdev: <pci slot>
> ring-<name>: N <unit>
> ...
> mem-<name>: N <unit>
> ...
>
> What else?
> Is ring a good common name? We actually more use engine in i915 but I 
> am not really bothered about it.

I would prefer engine as well. We are currently in the process of moving 
away from kernel rings, so that notion doesn't make much sense to keep 
forward.

Christian.

>
> Aggregated GPU usage could be easily and generically done by userspace 
> by adding all rings and normalizing.
>
>> On my submission 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-May%2F063149.html&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TW3HaPkqyr6jwhTUVRue3fGTyRfV4KnhEuRtTTI5fMY%3D&amp;reserved=0 
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-May%2F063149.html&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TW3HaPkqyr6jwhTUVRue3fGTyRfV4KnhEuRtTTI5fMY%3D&amp;reserved=0> I 
>> added a python script to print out the info. It has a CPU usage lower 
>> that top, for example.
>>
>> To be absolutely honest, I agree that there is an overhead, but It 
>> might not be as much as you fear.
>
> For me more the issue is that the extra number of operations grows 
> with the number of open files on the system, which has no relation to 
> the number of drm clients.
>
> Extra so if the monitoring tool wants to show _only_ DRM processes. 
> Then the cost scales with total number of processes time total number 
> of files on the server.
>
> This design inefficiency bothers me yes. This is somewhat alleviated 
> by the proposal from Chris 
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F419042%2F%3Fseries%3D86692%26rev%3D1&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNfe8h2BalOOc1Y0Idcs3wxnNOi74XhulkRlebmpgJM%3D&amp;reserved=0) 
> although there are downsides there as well. Like needing to keep a map 
> of pids to drm files in drivers.
>
> Btw what do you do in that tool for same fd in a multi-threaded process
> or so? Do you show duplicate entries or detect and ignore? I guess I 
> did not figure out if you show by pid/tgid or by fd.
>
> Regards,
>
> Tvrtko
>
>> ------------------------------------------------------------------------
>> *From:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> *Sent:* Monday, May 17, 2021 9:00 AM
>> *To:* Nieto, David M <David.Nieto@amd.com>; Daniel Vetter 
>> <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
>> *Cc:* Alex Deucher <alexdeucher@gmail.com>; Intel Graphics 
>> Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI 
>> developers <dri-devel@lists.freedesktop.org>
>> *Subject:* Re: [PATCH 0/7] Per client engine busyness
>>
>> On 17/05/2021 15:39, Nieto, David M wrote:
>>> [AMD Official Use Only]
>>>
>>>
>>> Maybe we could try to standardize how the different submission ring 
>>>    usage gets exposed in the fdinfo? We went the simple way of just 
>>> adding name and index, but if someone has a suggestion on how else 
>>> we could format them so there is commonality across vendors we could 
>>> just amend those.
>>
>> Could you paste an example of your format?
>>
>> Standardized fdinfo sounds good to me in principle. But I would also
>> like people to look at the procfs proposal from Chris,
>>    - link to which I have pasted elsewhere in the thread.
>>
>> Only potential issue with fdinfo I see at the moment is a bit of an
>> extra cost in DRM client discovery (compared to my sysfs series and also
>> procfs RFC from Chris). It would require reading all processes (well
>> threads, then maybe aggregating threads into parent processes), all fd
>> symlinks, and doing a stat on them to figure out which ones are DRM 
>> devices.
>>
>> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.
>>
>>> I’d really like to have the process managers tools display GPU usage 
>>> regardless of what vendor is installed.
>>
>> Definitely.
>>
>> Regards,
>>
>> Tvrtko


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-18 12:06                                   ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-18 12:06 UTC (permalink / raw)
  To: Tvrtko Ursulin, Nieto, David M, Daniel Vetter
  Cc: Alex Deucher, Intel Graphics Development, Maling list - DRI developers

Am 18.05.21 um 11:35 schrieb Tvrtko Ursulin:
>
> On 17/05/2021 19:02, Nieto, David M wrote:
>> [AMD Official Use Only]
>>
>>
>> The format is simple:
>>
>> <ringname><index>: <XXX.XX> %
>
> Hm what time period does the percent relate to?
>
> The i915 implementation uses accumulated nanoseconds active. That way 
> who reads the file can calculate the percentage relative to the time 
> period between two reads of the file.

That sounds much saner to me as well. The percentage calculation inside 
the kernel looks suspiciously misplaced.

>
>> we also have entries for the memory mapped:
>> mem <ttm pool> : <size> KiB
>
> Okay so in general key values per line in text format. Colon as 
> delimiter.
>
> What common fields could be useful between different drivers and what 
> common naming scheme, in order to enable as easy as possible creation 
> of a generic top-like tool?
>
> driver: <ko name>
> pdev: <pci slot>
> ring-<name>: N <unit>
> ...
> mem-<name>: N <unit>
> ...
>
> What else?
> Is ring a good common name? We actually more use engine in i915 but I 
> am not really bothered about it.

I would prefer engine as well. We are currently in the process of moving 
away from kernel rings, so that notion doesn't make much sense to keep 
forward.

Christian.

>
> Aggregated GPU usage could be easily and generically done by userspace 
> by adding all rings and normalizing.
>
>> On my submission 
>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-May%2F063149.html&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TW3HaPkqyr6jwhTUVRue3fGTyRfV4KnhEuRtTTI5fMY%3D&amp;reserved=0 
>> <https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Famd-gfx%2F2021-May%2F063149.html&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=TW3HaPkqyr6jwhTUVRue3fGTyRfV4KnhEuRtTTI5fMY%3D&amp;reserved=0> I 
>> added a python script to print out the info. It has a CPU usage lower 
>> that top, for example.
>>
>> To be absolutely honest, I agree that there is an overhead, but It 
>> might not be as much as you fear.
>
> For me more the issue is that the extra number of operations grows 
> with the number of open files on the system, which has no relation to 
> the number of drm clients.
>
> Extra so if the monitoring tool wants to show _only_ DRM processes. 
> Then the cost scales with total number of processes time total number 
> of files on the server.
>
> This design inefficiency bothers me yes. This is somewhat alleviated 
> by the proposal from Chris 
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2F419042%2F%3Fseries%3D86692%26rev%3D1&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Cbad72cde9a7248b20c7f08d919e03deb%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637569273164210285%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=jNfe8h2BalOOc1Y0Idcs3wxnNOi74XhulkRlebmpgJM%3D&amp;reserved=0) 
> although there are downsides there as well. Like needing to keep a map 
> of pids to drm files in drivers.
>
> Btw what do you do in that tool for same fd in a multi-threaded process
> or so? Do you show duplicate entries or detect and ignore? I guess I 
> did not figure out if you show by pid/tgid or by fd.
>
> Regards,
>
> Tvrtko
>
>> ------------------------------------------------------------------------
>> *From:* Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> *Sent:* Monday, May 17, 2021 9:00 AM
>> *To:* Nieto, David M <David.Nieto@amd.com>; Daniel Vetter 
>> <daniel@ffwll.ch>; Koenig, Christian <Christian.Koenig@amd.com>
>> *Cc:* Alex Deucher <alexdeucher@gmail.com>; Intel Graphics 
>> Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI 
>> developers <dri-devel@lists.freedesktop.org>
>> *Subject:* Re: [PATCH 0/7] Per client engine busyness
>>
>> On 17/05/2021 15:39, Nieto, David M wrote:
>>> [AMD Official Use Only]
>>>
>>>
>>> Maybe we could try to standardize how the different submission ring 
>>>    usage gets exposed in the fdinfo? We went the simple way of just 
>>> adding name and index, but if someone has a suggestion on how else 
>>> we could format them so there is commonality across vendors we could 
>>> just amend those.
>>
>> Could you paste an example of your format?
>>
>> Standardized fdinfo sounds good to me in principle. But I would also
>> like people to look at the procfs proposal from Chris,
>>    - link to which I have pasted elsewhere in the thread.
>>
>> Only potential issue with fdinfo I see at the moment is a bit of an
>> extra cost in DRM client discovery (compared to my sysfs series and also
>> procfs RFC from Chris). It would require reading all processes (well
>> threads, then maybe aggregating threads into parent processes), all fd
>> symlinks, and doing a stat on them to figure out which ones are DRM 
>> devices.
>>
>> Btw is DRM_MAJOR 226 consider uapi? I don't see it in uapi headers.
>>
>>> I’d really like to have the process managers tools display GPU usage 
>>> regardless of what vendor is installed.
>>
>> Definitely.
>>
>> Regards,
>>
>> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [PATCH 0/7] Per client engine busyness
  2021-05-18  9:40                                         ` Tvrtko Ursulin
  (?)
@ 2021-05-19 16:16                                           ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-19 16:16 UTC (permalink / raw)
  To: Daniel Stone
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, Koenig, Christian,
	Nieto, David M


On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> 
> On 18/05/2021 10:16, Daniel Stone wrote:
>> Hi,
>>
>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> I was just wondering if stat(2) and a chrdev major check would be a
>>> solid criteria to more efficiently (compared to parsing the text
>>> content) detect drm files while walking procfs.
>>
>> Maybe I'm missing something, but is the per-PID walk actually a
>> measurable performance issue rather than just a bit unpleasant?
> 
> Per pid and per each open fd.
> 
> As said in the other thread what bothers me a bit in this scheme is that 
> the cost of obtaining GPU usage scales based on non-GPU criteria.
> 
> For use case of a top-like tool which shows all processes this is a 
> smaller additional cost, but then for a gpu-top like tool it is somewhat 
> higher.

To further expand, not only cost would scale per pid multiplies per open 
fd, but to detect which of the fds are DRM I see these three options:

1) Open and parse fdinfo.
2) Name based matching ie /dev/dri/.. something.
3) Stat the symlink target and check for DRM major.

All sound quite sub-optimal to me.

Name based matching is probably the least evil on system resource usage 
(Keeping the dentry cache too hot? Too many syscalls?), even though 
fundamentally I don't it is the right approach.

What happens with dup(2) is another question.

Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

Regards,

Tvrtko
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
@ 2021-05-19 16:16                                           ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-19 16:16 UTC (permalink / raw)
  To: Daniel Stone
  Cc: jhubbard, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Koenig, Christian, aritger, Nieto,
	David M


On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> 
> On 18/05/2021 10:16, Daniel Stone wrote:
>> Hi,
>>
>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> I was just wondering if stat(2) and a chrdev major check would be a
>>> solid criteria to more efficiently (compared to parsing the text
>>> content) detect drm files while walking procfs.
>>
>> Maybe I'm missing something, but is the per-PID walk actually a
>> measurable performance issue rather than just a bit unpleasant?
> 
> Per pid and per each open fd.
> 
> As said in the other thread what bothers me a bit in this scheme is that 
> the cost of obtaining GPU usage scales based on non-GPU criteria.
> 
> For use case of a top-like tool which shows all processes this is a 
> smaller additional cost, but then for a gpu-top like tool it is somewhat 
> higher.

To further expand, not only cost would scale per pid multiplies per open 
fd, but to detect which of the fds are DRM I see these three options:

1) Open and parse fdinfo.
2) Name based matching ie /dev/dri/.. something.
3) Stat the symlink target and check for DRM major.

All sound quite sub-optimal to me.

Name based matching is probably the least evil on system resource usage 
(Keeping the dentry cache too hot? Too many syscalls?), even though 
fundamentally I don't it is the right approach.

What happens with dup(2) is another question.

Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-19 16:16                                           ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-19 16:16 UTC (permalink / raw)
  To: Daniel Stone
  Cc: jhubbard, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, Koenig, Christian,
	aritger, Nieto, David M


On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> 
> On 18/05/2021 10:16, Daniel Stone wrote:
>> Hi,
>>
>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> I was just wondering if stat(2) and a chrdev major check would be a
>>> solid criteria to more efficiently (compared to parsing the text
>>> content) detect drm files while walking procfs.
>>
>> Maybe I'm missing something, but is the per-PID walk actually a
>> measurable performance issue rather than just a bit unpleasant?
> 
> Per pid and per each open fd.
> 
> As said in the other thread what bothers me a bit in this scheme is that 
> the cost of obtaining GPU usage scales based on non-GPU criteria.
> 
> For use case of a top-like tool which shows all processes this is a 
> smaller additional cost, but then for a gpu-top like tool it is somewhat 
> higher.

To further expand, not only cost would scale per pid multiplies per open 
fd, but to detect which of the fds are DRM I see these three options:

1) Open and parse fdinfo.
2) Name based matching ie /dev/dri/.. something.
3) Stat the symlink target and check for DRM major.

All sound quite sub-optimal to me.

Name based matching is probably the least evil on system resource usage 
(Keeping the dentry cache too hot? Too many syscalls?), even though 
fundamentally I don't it is the right approach.

What happens with dup(2) is another question.

Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-19 16:16                                           ` Tvrtko Ursulin
  (?)
@ 2021-05-19 18:23                                             ` Daniel Vetter
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-19 18:23 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Intel Graphics Development, Maling list - DRI developers,
	Daniel Stone, Simon Ser, nouveau, Koenig, Christian, Nieto,
	David M

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-19 18:23                                             ` Daniel Vetter
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-19 18:23 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, nouveau, Koenig, Christian,
	aritger, Nieto, David M

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-19 18:23                                             ` Daniel Vetter
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-19 18:23 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, nouveau, Koenig,
	Christian, aritger, Nieto, David M

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-19 18:23                                             ` Daniel Vetter
  (?)
@ 2021-05-19 23:17                                               ` Nieto, David M
  -1 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-19 23:17 UTC (permalink / raw)
  To: Daniel Vetter, Tvrtko Ursulin
  Cc: Intel Graphics Development, Maling list - DRI developers,
	Daniel Stone, Simon Ser, nouveau, Koenig,  Christian


[-- Attachment #1.1: Type: text/plain, Size: 3525 bytes --]

[AMD Official Use Only]

Parsing over 550 processes for fdinfo is taking between 40-100ms single threaded in a 2GHz skylake IBRS within a VM using simple string comparisons and DIRent parsing. And that is pretty much the worst case scenario with some more optimized implementations.

David
________________________________
From: Daniel Vetter <daniel@ffwll.ch>
Sent: Wednesday, May 19, 2021 11:23 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&amp;reserved=0

[-- Attachment #1.2: Type: text/html, Size: 5670 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-19 23:17                                               ` Nieto, David M
  0 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-19 23:17 UTC (permalink / raw)
  To: Daniel Vetter, Tvrtko Ursulin
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, nouveau, Koenig,  Christian,
	aritger

[-- Attachment #1: Type: text/plain, Size: 3525 bytes --]

[AMD Official Use Only]

Parsing over 550 processes for fdinfo is taking between 40-100ms single threaded in a 2GHz skylake IBRS within a VM using simple string comparisons and DIRent parsing. And that is pretty much the worst case scenario with some more optimized implementations.

David
________________________________
From: Daniel Vetter <daniel@ffwll.ch>
Sent: Wednesday, May 19, 2021 11:23 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&amp;reserved=0

[-- Attachment #2: Type: text/html, Size: 5670 bytes --]

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-19 23:17                                               ` Nieto, David M
  0 siblings, 0 replies; 103+ messages in thread
From: Nieto, David M @ 2021-05-19 23:17 UTC (permalink / raw)
  To: Daniel Vetter, Tvrtko Ursulin
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, nouveau, Koenig,
	 Christian, aritger


[-- Attachment #1.1: Type: text/plain, Size: 3525 bytes --]

[AMD Official Use Only]

Parsing over 550 processes for fdinfo is taking between 40-100ms single threaded in a 2GHz skylake IBRS within a VM using simple string comparisons and DIRent parsing. And that is pretty much the worst case scenario with some more optimized implementations.

David
________________________________
From: Daniel Vetter <daniel@ffwll.ch>
Sent: Wednesday, May 19, 2021 11:23 AM
To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness

On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> >
> > On 18/05/2021 10:16, Daniel Stone wrote:
> >> Hi,
> >>
> >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> >> <tvrtko.ursulin@linux.intel.com> wrote:
> >>> I was just wondering if stat(2) and a chrdev major check would be a
> >>> solid criteria to more efficiently (compared to parsing the text
> >>> content) detect drm files while walking procfs.
> >>
> >> Maybe I'm missing something, but is the per-PID walk actually a
> >> measurable performance issue rather than just a bit unpleasant?
> >
> > Per pid and per each open fd.
> >
> > As said in the other thread what bothers me a bit in this scheme is that
> > the cost of obtaining GPU usage scales based on non-GPU criteria.
> >
> > For use case of a top-like tool which shows all processes this is a
> > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > higher.
>
> To further expand, not only cost would scale per pid multiplies per open
> fd, but to detect which of the fds are DRM I see these three options:
>
> 1) Open and parse fdinfo.
> 2) Name based matching ie /dev/dri/.. something.
> 3) Stat the symlink target and check for DRM major.

stat with symlink following should be plenty fast.

> All sound quite sub-optimal to me.
>
> Name based matching is probably the least evil on system resource usage
> (Keeping the dentry cache too hot? Too many syscalls?), even though
> fundamentally I don't it is the right approach.
>
> What happens with dup(2) is another question.

We need benchmark numbers showing that on anything remotely realistic
it's an actual problem. Until we've demonstrated it's a real problem
we don't need to solve it.

E.g. top with any sorting enabled also parses way more than it
displays on every update. It seems to be doing Just Fine (tm).

> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?

When we know we have a problem to solve we can take a look at solutions.
-Daniel
--
Daniel Vetter
Software Engineer, Intel Corporation
https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&amp;reserved=0

[-- Attachment #1.2: Type: text/html, Size: 5670 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-19 18:23                                             ` Daniel Vetter
  (?)
@ 2021-05-20  8:35                                               ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-20  8:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, Maling list - DRI developers,
	Daniel Stone, Simon Ser, nouveau, Koenig, Christian, Nieto,
	David M


On 19/05/2021 19:23, Daniel Vetter wrote:
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>
>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>
>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>> Hi,
>>>>
>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>> content) detect drm files while walking procfs.
>>>>
>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>> measurable performance issue rather than just a bit unpleasant?
>>>
>>> Per pid and per each open fd.
>>>
>>> As said in the other thread what bothers me a bit in this scheme is that
>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>
>>> For use case of a top-like tool which shows all processes this is a
>>> smaller additional cost, but then for a gpu-top like tool it is somewhat
>>> higher.
>>
>> To further expand, not only cost would scale per pid multiplies per open
>> fd, but to detect which of the fds are DRM I see these three options:
>>
>> 1) Open and parse fdinfo.
>> 2) Name based matching ie /dev/dri/.. something.
>> 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.

Maybe. I don't think my point about keeping the dentry cache needlessly 
hot is getting through at all. On my lightly loaded desktop:

  $ sudo lsof | wc -l
  599551

  $ sudo lsof | grep "/dev/dri/" | wc -l
  1965

It's going to look up ~600k pointless dentries in every iteration. Just 
to find a handful of DRM ones. Hard to say if that is better or worse 
than just parsing fdinfo text for all files. Will see.

>> All sound quite sub-optimal to me.
>>
>> Name based matching is probably the least evil on system resource usage
>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>> fundamentally I don't it is the right approach.
>>
>> What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.

Point about dup(2) is whether it is possible to distinguish the 
duplicated fds in fdinfo. If a DRM client dupes, and we found two 
fdinfos each saying client is using 20% GPU, we don't want to add it up 
to 40%.

> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).

Ha, perceptions differ. I see it using 4-5% while building the kernel on 
a Xeon server which I find quite a lot. :)

>> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
> 
> When we know we have a problem to solve we can take a look at solutions.

Yes I don't think the problem would be to add a better solution later, 
so happy to try the fdinfo first. I am simply pointing out a fundamental 
design inefficiency. Even if machines are getting faster and faster I 
don't think that should be an excuse to waste more and more under the 
hood, when a more efficient solution can be designed from the start.

Regards,

Tvrtko
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-20  8:35                                               ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-20  8:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, nouveau, Koenig, Christian,
	aritger, Nieto, David M


On 19/05/2021 19:23, Daniel Vetter wrote:
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>
>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>
>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>> Hi,
>>>>
>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>> content) detect drm files while walking procfs.
>>>>
>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>> measurable performance issue rather than just a bit unpleasant?
>>>
>>> Per pid and per each open fd.
>>>
>>> As said in the other thread what bothers me a bit in this scheme is that
>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>
>>> For use case of a top-like tool which shows all processes this is a
>>> smaller additional cost, but then for a gpu-top like tool it is somewhat
>>> higher.
>>
>> To further expand, not only cost would scale per pid multiplies per open
>> fd, but to detect which of the fds are DRM I see these three options:
>>
>> 1) Open and parse fdinfo.
>> 2) Name based matching ie /dev/dri/.. something.
>> 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.

Maybe. I don't think my point about keeping the dentry cache needlessly 
hot is getting through at all. On my lightly loaded desktop:

  $ sudo lsof | wc -l
  599551

  $ sudo lsof | grep "/dev/dri/" | wc -l
  1965

It's going to look up ~600k pointless dentries in every iteration. Just 
to find a handful of DRM ones. Hard to say if that is better or worse 
than just parsing fdinfo text for all files. Will see.

>> All sound quite sub-optimal to me.
>>
>> Name based matching is probably the least evil on system resource usage
>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>> fundamentally I don't it is the right approach.
>>
>> What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.

Point about dup(2) is whether it is possible to distinguish the 
duplicated fds in fdinfo. If a DRM client dupes, and we found two 
fdinfos each saying client is using 20% GPU, we don't want to add it up 
to 40%.

> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).

Ha, perceptions differ. I see it using 4-5% while building the kernel on 
a Xeon server which I find quite a lot. :)

>> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
> 
> When we know we have a problem to solve we can take a look at solutions.

Yes I don't think the problem would be to add a better solution later, 
so happy to try the fdinfo first. I am simply pointing out a fundamental 
design inefficiency. Even if machines are getting faster and faster I 
don't think that should be an excuse to waste more and more under the 
hood, when a more efficient solution can be designed from the start.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-20  8:35                                               ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-20  8:35 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, nouveau, Koenig,
	Christian, aritger, Nieto, David M


On 19/05/2021 19:23, Daniel Vetter wrote:
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
>>
>>
>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>
>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>> Hi,
>>>>
>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>> content) detect drm files while walking procfs.
>>>>
>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>> measurable performance issue rather than just a bit unpleasant?
>>>
>>> Per pid and per each open fd.
>>>
>>> As said in the other thread what bothers me a bit in this scheme is that
>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>
>>> For use case of a top-like tool which shows all processes this is a
>>> smaller additional cost, but then for a gpu-top like tool it is somewhat
>>> higher.
>>
>> To further expand, not only cost would scale per pid multiplies per open
>> fd, but to detect which of the fds are DRM I see these three options:
>>
>> 1) Open and parse fdinfo.
>> 2) Name based matching ie /dev/dri/.. something.
>> 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.

Maybe. I don't think my point about keeping the dentry cache needlessly 
hot is getting through at all. On my lightly loaded desktop:

  $ sudo lsof | wc -l
  599551

  $ sudo lsof | grep "/dev/dri/" | wc -l
  1965

It's going to look up ~600k pointless dentries in every iteration. Just 
to find a handful of DRM ones. Hard to say if that is better or worse 
than just parsing fdinfo text for all files. Will see.

>> All sound quite sub-optimal to me.
>>
>> Name based matching is probably the least evil on system resource usage
>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>> fundamentally I don't it is the right approach.
>>
>> What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.

Point about dup(2) is whether it is possible to distinguish the 
duplicated fds in fdinfo. If a DRM client dupes, and we found two 
fdinfos each saying client is using 20% GPU, we don't want to add it up 
to 40%.

> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).

Ha, perceptions differ. I see it using 4-5% while building the kernel on 
a Xeon server which I find quite a lot. :)

>> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
> 
> When we know we have a problem to solve we can take a look at solutions.

Yes I don't think the problem would be to add a better solution later, 
so happy to try the fdinfo first. I am simply pointing out a fundamental 
design inefficiency. Even if machines are getting faster and faster I 
don't think that should be an excuse to waste more and more under the 
hood, when a more efficient solution can be designed from the start.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-19 23:17                                               ` Nieto, David M
  (?)
@ 2021-05-20 14:11                                                 ` Daniel Vetter
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-20 14:11 UTC (permalink / raw)
  To: Nieto, David M
  Cc: Tvrtko Ursulin, Simon Ser, nouveau, Intel Graphics Development,
	Daniel Stone, Maling list - DRI developers, Daniel Vetter,
	Koenig, Christian

On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> Parsing over 550 processes for fdinfo is taking between 40-100ms single
> threaded in a 2GHz skylake IBRS within a VM using simple string
> comparisons and DIRent parsing. And that is pretty much the worst case
> scenario with some more optimized implementations.

I think this is plenty ok, and if it's not you could probably make this
massively faster with io_uring for all the fs operations and whack a
parser-generator on top for real parsing speed.

So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
approach at all, and focuse more on trying to reasonably (but not too
much, this is still drm render stuff after all) standardize how it works
and how we'll extend it all. I think there's tons of good suggestions in
this thread on this topic already.

/me out
-Daniel

> 
> David
> ________________________________
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Wednesday, May 19, 2021 11:23 AM
> To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
> 
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> > >
> > > On 18/05/2021 10:16, Daniel Stone wrote:
> > >> Hi,
> > >>
> > >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> > >> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>> I was just wondering if stat(2) and a chrdev major check would be a
> > >>> solid criteria to more efficiently (compared to parsing the text
> > >>> content) detect drm files while walking procfs.
> > >>
> > >> Maybe I'm missing something, but is the per-PID walk actually a
> > >> measurable performance issue rather than just a bit unpleasant?
> > >
> > > Per pid and per each open fd.
> > >
> > > As said in the other thread what bothers me a bit in this scheme is that
> > > the cost of obtaining GPU usage scales based on non-GPU criteria.
> > >
> > > For use case of a top-like tool which shows all processes this is a
> > > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > > higher.
> >
> > To further expand, not only cost would scale per pid multiplies per open
> > fd, but to detect which of the fds are DRM I see these three options:
> >
> > 1) Open and parse fdinfo.
> > 2) Name based matching ie /dev/dri/.. something.
> > 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.
> 
> > All sound quite sub-optimal to me.
> >
> > Name based matching is probably the least evil on system resource usage
> > (Keeping the dentry cache too hot? Too many syscalls?), even though
> > fundamentally I don't it is the right approach.
> >
> > What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.
> 
> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).
> 
> > Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
> 
> When we know we have a problem to solve we can take a look at solutions.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&amp;reserved=0

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-20 14:11                                                 ` Daniel Vetter
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-20 14:11 UTC (permalink / raw)
  To: Nieto, David M
  Cc: Tvrtko Ursulin, nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, Koenig, Christian,
	aritger

On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> Parsing over 550 processes for fdinfo is taking between 40-100ms single
> threaded in a 2GHz skylake IBRS within a VM using simple string
> comparisons and DIRent parsing. And that is pretty much the worst case
> scenario with some more optimized implementations.

I think this is plenty ok, and if it's not you could probably make this
massively faster with io_uring for all the fs operations and whack a
parser-generator on top for real parsing speed.

So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
approach at all, and focuse more on trying to reasonably (but not too
much, this is still drm render stuff after all) standardize how it works
and how we'll extend it all. I think there's tons of good suggestions in
this thread on this topic already.

/me out
-Daniel

> 
> David
> ________________________________
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Wednesday, May 19, 2021 11:23 AM
> To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
> 
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> > >
> > > On 18/05/2021 10:16, Daniel Stone wrote:
> > >> Hi,
> > >>
> > >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> > >> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>> I was just wondering if stat(2) and a chrdev major check would be a
> > >>> solid criteria to more efficiently (compared to parsing the text
> > >>> content) detect drm files while walking procfs.
> > >>
> > >> Maybe I'm missing something, but is the per-PID walk actually a
> > >> measurable performance issue rather than just a bit unpleasant?
> > >
> > > Per pid and per each open fd.
> > >
> > > As said in the other thread what bothers me a bit in this scheme is that
> > > the cost of obtaining GPU usage scales based on non-GPU criteria.
> > >
> > > For use case of a top-like tool which shows all processes this is a
> > > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > > higher.
> >
> > To further expand, not only cost would scale per pid multiplies per open
> > fd, but to detect which of the fds are DRM I see these three options:
> >
> > 1) Open and parse fdinfo.
> > 2) Name based matching ie /dev/dri/.. something.
> > 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.
> 
> > All sound quite sub-optimal to me.
> >
> > Name based matching is probably the least evil on system resource usage
> > (Keeping the dentry cache too hot? Too many syscalls?), even though
> > fundamentally I don't it is the right approach.
> >
> > What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.
> 
> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).
> 
> > Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
> 
> When we know we have a problem to solve we can take a look at solutions.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&amp;reserved=0

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-20 14:11                                                 ` Daniel Vetter
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-05-20 14:11 UTC (permalink / raw)
  To: Nieto, David M
  Cc: Simon Ser, nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, Koenig, Christian,
	aritger

On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M wrote:
> [AMD Official Use Only]
> 
> Parsing over 550 processes for fdinfo is taking between 40-100ms single
> threaded in a 2GHz skylake IBRS within a VM using simple string
> comparisons and DIRent parsing. And that is pretty much the worst case
> scenario with some more optimized implementations.

I think this is plenty ok, and if it's not you could probably make this
massively faster with io_uring for all the fs operations and whack a
parser-generator on top for real parsing speed.

So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
approach at all, and focuse more on trying to reasonably (but not too
much, this is still drm render stuff after all) standardize how it works
and how we'll extend it all. I think there's tons of good suggestions in
this thread on this topic already.

/me out
-Daniel

> 
> David
> ________________________________
> From: Daniel Vetter <daniel@ffwll.ch>
> Sent: Wednesday, May 19, 2021 11:23 AM
> To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
> Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
> 
> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
> <tvrtko.ursulin@linux.intel.com> wrote:
> >
> >
> > On 18/05/2021 10:40, Tvrtko Ursulin wrote:
> > >
> > > On 18/05/2021 10:16, Daniel Stone wrote:
> > >> Hi,
> > >>
> > >> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
> > >> <tvrtko.ursulin@linux.intel.com> wrote:
> > >>> I was just wondering if stat(2) and a chrdev major check would be a
> > >>> solid criteria to more efficiently (compared to parsing the text
> > >>> content) detect drm files while walking procfs.
> > >>
> > >> Maybe I'm missing something, but is the per-PID walk actually a
> > >> measurable performance issue rather than just a bit unpleasant?
> > >
> > > Per pid and per each open fd.
> > >
> > > As said in the other thread what bothers me a bit in this scheme is that
> > > the cost of obtaining GPU usage scales based on non-GPU criteria.
> > >
> > > For use case of a top-like tool which shows all processes this is a
> > > smaller additional cost, but then for a gpu-top like tool it is somewhat
> > > higher.
> >
> > To further expand, not only cost would scale per pid multiplies per open
> > fd, but to detect which of the fds are DRM I see these three options:
> >
> > 1) Open and parse fdinfo.
> > 2) Name based matching ie /dev/dri/.. something.
> > 3) Stat the symlink target and check for DRM major.
> 
> stat with symlink following should be plenty fast.
> 
> > All sound quite sub-optimal to me.
> >
> > Name based matching is probably the least evil on system resource usage
> > (Keeping the dentry cache too hot? Too many syscalls?), even though
> > fundamentally I don't it is the right approach.
> >
> > What happens with dup(2) is another question.
> 
> We need benchmark numbers showing that on anything remotely realistic
> it's an actual problem. Until we've demonstrated it's a real problem
> we don't need to solve it.
> 
> E.g. top with any sorting enabled also parses way more than it
> displays on every update. It seems to be doing Just Fine (tm).
> 
> > Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
> 
> When we know we have a problem to solve we can take a look at solutions.
> -Daniel
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CDavid.Nieto%40amd.com%7Cf6aea97532cf41f916de08d91af32cc1%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637570453997158377%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=4CFrY9qWbJREcIcSzeO9KIn2P%2Fw6k%2BYdNlh6rdS%2BEh4%3D&amp;reserved=0

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-20 14:11                                                 ` Daniel Vetter
  (?)
@ 2021-05-20 14:12                                                   ` Christian König
  -1 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-20 14:12 UTC (permalink / raw)
  To: Daniel Vetter, Nieto, David M
  Cc: Tvrtko Ursulin, Simon Ser, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Daniel Stone



Am 20.05.21 um 16:11 schrieb Daniel Vetter:
> On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M wrote:
>> [AMD Official Use Only]
>>
>> Parsing over 550 processes for fdinfo is taking between 40-100ms single
>> threaded in a 2GHz skylake IBRS within a VM using simple string
>> comparisons and DIRent parsing. And that is pretty much the worst case
>> scenario with some more optimized implementations.
> I think this is plenty ok, and if it's not you could probably make this
> massively faster with io_uring for all the fs operations and whack a
> parser-generator on top for real parsing speed.

Well if it becomes a problem fixing the debugfs "clients" file and 
making it sysfs shouldn't be much of a problem later on.

Christian.

>
> So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
> approach at all, and focuse more on trying to reasonably (but not too
> much, this is still drm render stuff after all) standardize how it works
> and how we'll extend it all. I think there's tons of good suggestions in
> this thread on this topic already.
>
> /me out
> -Daniel
>
>> David
>> ________________________________
>> From: Daniel Vetter <daniel@ffwll.ch>
>> Sent: Wednesday, May 19, 2021 11:23 AM
>> To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
>> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
>>
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>>> content) detect drm files while walking procfs.
>>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>>> measurable performance issue rather than just a bit unpleasant?
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this scheme is that
>>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this is a
>>>> smaller additional cost, but then for a gpu-top like tool it is somewhat
>>>> higher.
>>> To further expand, not only cost would scale per pid multiplies per open
>>> fd, but to detect which of the fds are DRM I see these three options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>> stat with symlink following should be plenty fast.
>>
>>> All sound quite sub-optimal to me.
>>>
>>> Name based matching is probably the least evil on system resource usage
>>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>>> fundamentally I don't it is the right approach.
>>>
>>> What happens with dup(2) is another question.
>> We need benchmark numbers showing that on anything remotely realistic
>> it's an actual problem. Until we've demonstrated it's a real problem
>> we don't need to solve it.
>>
>> E.g. top with any sorting enabled also parses way more than it
>> displays on every update. It seems to be doing Just Fine (tm).
>>
>>> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
>> When we know we have a problem to solve we can take a look at solutions.
>> -Daniel
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Ced2eccaa081d4cd336d408d91b991ee0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637571166744508313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZihrnanU70nJAM6bHYCjRnURDDCIdwGI85imjGd%2FNgs%3D&amp;reserved=0

_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-20 14:12                                                   ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-20 14:12 UTC (permalink / raw)
  To: Daniel Vetter, Nieto, David M
  Cc: Tvrtko Ursulin, nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, aritger



Am 20.05.21 um 16:11 schrieb Daniel Vetter:
> On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M wrote:
>> [AMD Official Use Only]
>>
>> Parsing over 550 processes for fdinfo is taking between 40-100ms single
>> threaded in a 2GHz skylake IBRS within a VM using simple string
>> comparisons and DIRent parsing. And that is pretty much the worst case
>> scenario with some more optimized implementations.
> I think this is plenty ok, and if it's not you could probably make this
> massively faster with io_uring for all the fs operations and whack a
> parser-generator on top for real parsing speed.

Well if it becomes a problem fixing the debugfs "clients" file and 
making it sysfs shouldn't be much of a problem later on.

Christian.

>
> So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
> approach at all, and focuse more on trying to reasonably (but not too
> much, this is still drm render stuff after all) standardize how it works
> and how we'll extend it all. I think there's tons of good suggestions in
> this thread on this topic already.
>
> /me out
> -Daniel
>
>> David
>> ________________________________
>> From: Daniel Vetter <daniel@ffwll.ch>
>> Sent: Wednesday, May 19, 2021 11:23 AM
>> To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
>> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
>>
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>>> content) detect drm files while walking procfs.
>>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>>> measurable performance issue rather than just a bit unpleasant?
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this scheme is that
>>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this is a
>>>> smaller additional cost, but then for a gpu-top like tool it is somewhat
>>>> higher.
>>> To further expand, not only cost would scale per pid multiplies per open
>>> fd, but to detect which of the fds are DRM I see these three options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>> stat with symlink following should be plenty fast.
>>
>>> All sound quite sub-optimal to me.
>>>
>>> Name based matching is probably the least evil on system resource usage
>>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>>> fundamentally I don't it is the right approach.
>>>
>>> What happens with dup(2) is another question.
>> We need benchmark numbers showing that on anything remotely realistic
>> it's an actual problem. Until we've demonstrated it's a real problem
>> we don't need to solve it.
>>
>> E.g. top with any sorting enabled also parses way more than it
>> displays on every update. It seems to be doing Just Fine (tm).
>>
>>> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
>> When we know we have a problem to solve we can take a look at solutions.
>> -Daniel
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Ced2eccaa081d4cd336d408d91b991ee0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637571166744508313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZihrnanU70nJAM6bHYCjRnURDDCIdwGI85imjGd%2FNgs%3D&amp;reserved=0


^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-20 14:12                                                   ` Christian König
  0 siblings, 0 replies; 103+ messages in thread
From: Christian König @ 2021-05-20 14:12 UTC (permalink / raw)
  To: Daniel Vetter, Nieto, David M
  Cc: Simon Ser, nouveau, Intel Graphics Development,
	Maling list - DRI developers, jhubbard, aritger



Am 20.05.21 um 16:11 schrieb Daniel Vetter:
> On Wed, May 19, 2021 at 11:17:24PM +0000, Nieto, David M wrote:
>> [AMD Official Use Only]
>>
>> Parsing over 550 processes for fdinfo is taking between 40-100ms single
>> threaded in a 2GHz skylake IBRS within a VM using simple string
>> comparisons and DIRent parsing. And that is pretty much the worst case
>> scenario with some more optimized implementations.
> I think this is plenty ok, and if it's not you could probably make this
> massively faster with io_uring for all the fs operations and whack a
> parser-generator on top for real parsing speed.

Well if it becomes a problem fixing the debugfs "clients" file and 
making it sysfs shouldn't be much of a problem later on.

Christian.

>
> So imo we shouldn't worry about algorithmic inefficiency of the fdinfo
> approach at all, and focuse more on trying to reasonably (but not too
> much, this is still drm render stuff after all) standardize how it works
> and how we'll extend it all. I think there's tons of good suggestions in
> this thread on this topic already.
>
> /me out
> -Daniel
>
>> David
>> ________________________________
>> From: Daniel Vetter <daniel@ffwll.ch>
>> Sent: Wednesday, May 19, 2021 11:23 AM
>> To: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com>
>> Cc: Daniel Stone <daniel@fooishbar.org>; jhubbard@nvidia.com <jhubbard@nvidia.com>; nouveau@lists.freedesktop.org <nouveau@lists.freedesktop.org>; Intel Graphics Development <Intel-gfx@lists.freedesktop.org>; Maling list - DRI developers <dri-devel@lists.freedesktop.org>; Simon Ser <contact@emersion.fr>; Koenig, Christian <Christian.Koenig@amd.com>; aritger@nvidia.com <aritger@nvidia.com>; Nieto, David M <David.Nieto@amd.com>
>> Subject: Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
>>
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>>> content) detect drm files while walking procfs.
>>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>>> measurable performance issue rather than just a bit unpleasant?
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this scheme is that
>>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this is a
>>>> smaller additional cost, but then for a gpu-top like tool it is somewhat
>>>> higher.
>>> To further expand, not only cost would scale per pid multiplies per open
>>> fd, but to detect which of the fds are DRM I see these three options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>> stat with symlink following should be plenty fast.
>>
>>> All sound quite sub-optimal to me.
>>>
>>> Name based matching is probably the least evil on system resource usage
>>> (Keeping the dentry cache too hot? Too many syscalls?), even though
>>> fundamentally I don't it is the right approach.
>>>
>>> What happens with dup(2) is another question.
>> We need benchmark numbers showing that on anything remotely realistic
>> it's an actual problem. Until we've demonstrated it's a real problem
>> we don't need to solve it.
>>
>> E.g. top with any sorting enabled also parses way more than it
>> displays on every update. It seems to be doing Just Fine (tm).
>>
>>> Does anyone have any feedback on the /proc/<pid>/gpu idea at all?
>> When we know we have a problem to solve we can take a look at solutions.
>> -Daniel
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> https://nam11.safelinks.protection.outlook.com/?url=http%3A%2F%2Fblog.ffwll.ch%2F&amp;data=04%7C01%7CChristian.Koenig%40amd.com%7Ced2eccaa081d4cd336d408d91b991ee0%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637571166744508313%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=ZihrnanU70nJAM6bHYCjRnURDDCIdwGI85imjGd%2FNgs%3D&amp;reserved=0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-20 14:12                                                   ` Christian König
  (?)
@ 2021-05-20 14:17                                                     ` arabek
  -1 siblings, 0 replies; 103+ messages in thread
From: arabek @ 2021-05-20 14:17 UTC (permalink / raw)
  To: Christian König
  Cc: Tvrtko Ursulin, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Daniel Stone, Daniel Vetter,
	Simon Ser, Nieto, David M

> Well if it becomes a problem fixing the debugfs "clients" file and
> making it sysfs shouldn't be much of a problem later on.

Why not to try using something in terms of perf / opensnoop or bpf
to do the work. Should be optimal enough.

ie.
http://www.brendangregg.com/blog/2014-07-25/opensnoop-for-linux.html
https://man7.org/linux/man-pages/man2/bpf.2.html
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-20 14:17                                                     ` arabek
  0 siblings, 0 replies; 103+ messages in thread
From: arabek @ 2021-05-20 14:17 UTC (permalink / raw)
  To: Christian König
  Cc: Tvrtko Ursulin, nouveau, Intel Graphics Development,
	Maling list - DRI developers, Nieto, David M

> Well if it becomes a problem fixing the debugfs "clients" file and
> making it sysfs shouldn't be much of a problem later on.

Why not to try using something in terms of perf / opensnoop or bpf
to do the work. Should be optimal enough.

ie.
http://www.brendangregg.com/blog/2014-07-25/opensnoop-for-linux.html
https://man7.org/linux/man-pages/man2/bpf.2.html

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [Nouveau]  [PATCH 0/7] Per client engine busyness
@ 2021-05-20 14:17                                                     ` arabek
  0 siblings, 0 replies; 103+ messages in thread
From: arabek @ 2021-05-20 14:17 UTC (permalink / raw)
  To: Christian König
  Cc: nouveau, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, Nieto, David M

> Well if it becomes a problem fixing the debugfs "clients" file and
> making it sysfs shouldn't be much of a problem later on.

Why not to try using something in terms of perf / opensnoop or bpf
to do the work. Should be optimal enough.

ie.
http://www.brendangregg.com/blog/2014-07-25/opensnoop-for-linux.html
https://man7.org/linux/man-pages/man2/bpf.2.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Nouveau] [Intel-gfx] [PATCH 0/7] Per client engine busyness
  2021-05-20  8:35                                               ` Tvrtko Ursulin
  (?)
@ 2021-05-24 10:48                                                 ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-24 10:48 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Intel Graphics Development, Maling list - DRI developers,
	Daniel Stone, Simon Ser, nouveau, Koenig, Christian, Nieto,
	David M


On 20/05/2021 09:35, Tvrtko Ursulin wrote:
> On 19/05/2021 19:23, Daniel Vetter wrote:
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>>
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>>> content) detect drm files while walking procfs.
>>>>>
>>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>>> measurable performance issue rather than just a bit unpleasant?
>>>>
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this scheme is 
>>>> that
>>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this is a
>>>> smaller additional cost, but then for a gpu-top like tool it is 
>>>> somewhat
>>>> higher.
>>>
>>> To further expand, not only cost would scale per pid multiplies per open
>>> fd, but to detect which of the fds are DRM I see these three options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>>
>> stat with symlink following should be plenty fast.
> 
> Maybe. I don't think my point about keeping the dentry cache needlessly 
> hot is getting through at all. On my lightly loaded desktop:
> 
>   $ sudo lsof | wc -l
>   599551
> 
>   $ sudo lsof | grep "/dev/dri/" | wc -l
>   1965
> 
> It's going to look up ~600k pointless dentries in every iteration. Just 
> to find a handful of DRM ones. Hard to say if that is better or worse 
> than just parsing fdinfo text for all files. Will see.

CPU usage looks passable under a production kernel (non-debug). Once a 
second refresh period, on a not really that loaded system (115 running 
processes, 3096 open file descriptors as reported by lsof, none of which 
are DRM), results in a system call heavy load:

real    0m55.348s
user    0m0.100s
sys     0m0.319s

Once per second loop is essentially along the lines of:

   for each pid in /proc/<pid>:
     for each fd in /proc/<pid>/fdinfo:
       if fstatat(fd) is drm major:
         read fdinfo text in one sweep and parse it

I'll post the quick intel_gpu_top patch for reference but string parsing 
in C leaves a few things to be desired there.

Regards,

Tvrtko
_______________________________________________
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-24 10:48                                                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-24 10:48 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, nouveau, Koenig, Christian,
	aritger, Nieto, David M


On 20/05/2021 09:35, Tvrtko Ursulin wrote:
> On 19/05/2021 19:23, Daniel Vetter wrote:
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>>
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>>> content) detect drm files while walking procfs.
>>>>>
>>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>>> measurable performance issue rather than just a bit unpleasant?
>>>>
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this scheme is 
>>>> that
>>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this is a
>>>> smaller additional cost, but then for a gpu-top like tool it is 
>>>> somewhat
>>>> higher.
>>>
>>> To further expand, not only cost would scale per pid multiplies per open
>>> fd, but to detect which of the fds are DRM I see these three options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>>
>> stat with symlink following should be plenty fast.
> 
> Maybe. I don't think my point about keeping the dentry cache needlessly 
> hot is getting through at all. On my lightly loaded desktop:
> 
>   $ sudo lsof | wc -l
>   599551
> 
>   $ sudo lsof | grep "/dev/dri/" | wc -l
>   1965
> 
> It's going to look up ~600k pointless dentries in every iteration. Just 
> to find a handful of DRM ones. Hard to say if that is better or worse 
> than just parsing fdinfo text for all files. Will see.

CPU usage looks passable under a production kernel (non-debug). Once a 
second refresh period, on a not really that loaded system (115 running 
processes, 3096 open file descriptors as reported by lsof, none of which 
are DRM), results in a system call heavy load:

real    0m55.348s
user    0m0.100s
sys     0m0.319s

Once per second loop is essentially along the lines of:

   for each pid in /proc/<pid>:
     for each fd in /proc/<pid>/fdinfo:
       if fstatat(fd) is drm major:
         read fdinfo text in one sweep and parse it

I'll post the quick intel_gpu_top patch for reference but string parsing 
in C leaves a few things to be desired there.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-05-24 10:48                                                 ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-05-24 10:48 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: jhubbard, Intel Graphics Development,
	Maling list - DRI developers, Simon Ser, nouveau, Koenig,
	Christian, aritger, Nieto, David M


On 20/05/2021 09:35, Tvrtko Ursulin wrote:
> On 19/05/2021 19:23, Daniel Vetter wrote:
>> On Wed, May 19, 2021 at 6:16 PM Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>
>>>
>>> On 18/05/2021 10:40, Tvrtko Ursulin wrote:
>>>>
>>>> On 18/05/2021 10:16, Daniel Stone wrote:
>>>>> Hi,
>>>>>
>>>>> On Tue, 18 May 2021 at 10:09, Tvrtko Ursulin
>>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>> I was just wondering if stat(2) and a chrdev major check would be a
>>>>>> solid criteria to more efficiently (compared to parsing the text
>>>>>> content) detect drm files while walking procfs.
>>>>>
>>>>> Maybe I'm missing something, but is the per-PID walk actually a
>>>>> measurable performance issue rather than just a bit unpleasant?
>>>>
>>>> Per pid and per each open fd.
>>>>
>>>> As said in the other thread what bothers me a bit in this scheme is 
>>>> that
>>>> the cost of obtaining GPU usage scales based on non-GPU criteria.
>>>>
>>>> For use case of a top-like tool which shows all processes this is a
>>>> smaller additional cost, but then for a gpu-top like tool it is 
>>>> somewhat
>>>> higher.
>>>
>>> To further expand, not only cost would scale per pid multiplies per open
>>> fd, but to detect which of the fds are DRM I see these three options:
>>>
>>> 1) Open and parse fdinfo.
>>> 2) Name based matching ie /dev/dri/.. something.
>>> 3) Stat the symlink target and check for DRM major.
>>
>> stat with symlink following should be plenty fast.
> 
> Maybe. I don't think my point about keeping the dentry cache needlessly 
> hot is getting through at all. On my lightly loaded desktop:
> 
>   $ sudo lsof | wc -l
>   599551
> 
>   $ sudo lsof | grep "/dev/dri/" | wc -l
>   1965
> 
> It's going to look up ~600k pointless dentries in every iteration. Just 
> to find a handful of DRM ones. Hard to say if that is better or worse 
> than just parsing fdinfo text for all files. Will see.

CPU usage looks passable under a production kernel (non-debug). Once a 
second refresh period, on a not really that loaded system (115 running 
processes, 3096 open file descriptors as reported by lsof, none of which 
are DRM), results in a system call heavy load:

real    0m55.348s
user    0m0.100s
sys     0m0.319s

Once per second loop is essentially along the lines of:

   for each pid in /proc/<pid>:
     for each fd in /proc/<pid>/fdinfo:
       if fstatat(fd) is drm major:
         read fdinfo text in one sweep and parse it

I'll post the quick intel_gpu_top patch for reference but string parsing 
in C leaves a few things to be desired there.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-05-14 15:10                       ` [Intel-gfx] " Christian König
@ 2021-06-28 10:16                         ` Tvrtko Ursulin
  -1 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-06-28 10:16 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers



On 14/05/2021 16:10, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
>>
>> On 14/05/2021 15:56, Christian König wrote:
>>> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>>>>
>>>> On 14/05/2021 14:53, Christian König wrote:
>>>>>>
>>>>>> David also said that you considered sysfs but were wary of 
>>>>>> exposing process info in there. To clarify, my patch is not 
>>>>>> exposing sysfs entry per process, but one per open drm fd.
>>>>>>
>>>>>
>>>>> Yes, we discussed this as well, but then rejected the approach.
>>>>>
>>>>> To have useful information related to the open drm fd you need to 
>>>>> related that to process(es) which have that file descriptor open. 
>>>>> Just tracking who opened it first like DRM does is pretty useless 
>>>>> on modern systems.
>>>>
>>>> We do update the pid/name for fds passed over unix sockets.
>>>
>>> Well I just double checked and that is not correct.
>>>
>>> Could be that i915 has some special code for that, but on my laptop I 
>>> only see the X server under the "clients" debugfs file.
>>
>> Yes we have special code in i915 for this. Part of this series we are 
>> discussing here.
> 
> Ah, yeah you should mention that. Could we please separate that into 
> common code instead? Cause I really see that as a bug in the current 
> handling independent of the discussion here.

What we do in i915 is update the pid and name when a task different to 
the one which opened the fd does a GEM context create ioctl.

Moving that to DRM core would be along the lines of doing the same check 
and update on every ioctl. Maybe allow the update to be one time only if 
that would work. Would this be desirable and acceptable? If so I can 
definitely sketch it out.

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-06-28 10:16                         ` Tvrtko Ursulin
  0 siblings, 0 replies; 103+ messages in thread
From: Tvrtko Ursulin @ 2021-06-28 10:16 UTC (permalink / raw)
  To: Christian König, Nieto, David M, Alex Deucher
  Cc: Intel Graphics Development, Maling list - DRI developers



On 14/05/2021 16:10, Christian König wrote:
> Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
>>
>> On 14/05/2021 15:56, Christian König wrote:
>>> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
>>>>
>>>> On 14/05/2021 14:53, Christian König wrote:
>>>>>>
>>>>>> David also said that you considered sysfs but were wary of 
>>>>>> exposing process info in there. To clarify, my patch is not 
>>>>>> exposing sysfs entry per process, but one per open drm fd.
>>>>>>
>>>>>
>>>>> Yes, we discussed this as well, but then rejected the approach.
>>>>>
>>>>> To have useful information related to the open drm fd you need to 
>>>>> related that to process(es) which have that file descriptor open. 
>>>>> Just tracking who opened it first like DRM does is pretty useless 
>>>>> on modern systems.
>>>>
>>>> We do update the pid/name for fds passed over unix sockets.
>>>
>>> Well I just double checked and that is not correct.
>>>
>>> Could be that i915 has some special code for that, but on my laptop I 
>>> only see the X server under the "clients" debugfs file.
>>
>> Yes we have special code in i915 for this. Part of this series we are 
>> discussing here.
> 
> Ah, yeah you should mention that. Could we please separate that into 
> common code instead? Cause I really see that as a bug in the current 
> handling independent of the discussion here.

What we do in i915 is update the pid and name when a task different to 
the one which opened the fd does a GEM context create ioctl.

Moving that to DRM core would be along the lines of doing the same check 
and update on every ioctl. Maybe allow the update to be one time only if 
that would work. Would this be desirable and acceptable? If so I can 
definitely sketch it out.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [PATCH 0/7] Per client engine busyness
  2021-06-28 10:16                         ` [Intel-gfx] " Tvrtko Ursulin
@ 2021-06-28 14:37                           ` Daniel Vetter
  -1 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-06-28 14:37 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Intel Graphics Development, Christian König,
	Maling list - DRI developers, Nieto, David M

On Mon, Jun 28, 2021 at 12:18 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
>
> On 14/05/2021 16:10, Christian König wrote:
> > Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> >>
> >> On 14/05/2021 15:56, Christian König wrote:
> >>> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> >>>>
> >>>> On 14/05/2021 14:53, Christian König wrote:
> >>>>>>
> >>>>>> David also said that you considered sysfs but were wary of
> >>>>>> exposing process info in there. To clarify, my patch is not
> >>>>>> exposing sysfs entry per process, but one per open drm fd.
> >>>>>>
> >>>>>
> >>>>> Yes, we discussed this as well, but then rejected the approach.
> >>>>>
> >>>>> To have useful information related to the open drm fd you need to
> >>>>> related that to process(es) which have that file descriptor open.
> >>>>> Just tracking who opened it first like DRM does is pretty useless
> >>>>> on modern systems.
> >>>>
> >>>> We do update the pid/name for fds passed over unix sockets.
> >>>
> >>> Well I just double checked and that is not correct.
> >>>
> >>> Could be that i915 has some special code for that, but on my laptop I
> >>> only see the X server under the "clients" debugfs file.
> >>
> >> Yes we have special code in i915 for this. Part of this series we are
> >> discussing here.
> >
> > Ah, yeah you should mention that. Could we please separate that into
> > common code instead? Cause I really see that as a bug in the current
> > handling independent of the discussion here.
>
> What we do in i915 is update the pid and name when a task different to
> the one which opened the fd does a GEM context create ioctl.
>
> Moving that to DRM core would be along the lines of doing the same check
> and update on every ioctl. Maybe allow the update to be one time only if
> that would work. Would this be desirable and acceptable? If so I can
> definitely sketch it out.

If we go with fdinfo for these it becomes clear who all owns the file,
since it's then a per-process thing. Not sure how much smarts we
should have for internal debugfs output. Maybe one-shot update on
first driver ioctl (since if you're on render nodes then X does the
drm auth dance, so "first ioctl" is wrong).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 103+ messages in thread

* Re: [Intel-gfx] [PATCH 0/7] Per client engine busyness
@ 2021-06-28 14:37                           ` Daniel Vetter
  0 siblings, 0 replies; 103+ messages in thread
From: Daniel Vetter @ 2021-06-28 14:37 UTC (permalink / raw)
  To: Tvrtko Ursulin
  Cc: Alex Deucher, Intel Graphics Development, Christian König,
	Maling list - DRI developers, Nieto, David M

On Mon, Jun 28, 2021 at 12:18 PM Tvrtko Ursulin
<tvrtko.ursulin@linux.intel.com> wrote:
>
>
>
> On 14/05/2021 16:10, Christian König wrote:
> > Am 14.05.21 um 17:03 schrieb Tvrtko Ursulin:
> >>
> >> On 14/05/2021 15:56, Christian König wrote:
> >>> Am 14.05.21 um 16:47 schrieb Tvrtko Ursulin:
> >>>>
> >>>> On 14/05/2021 14:53, Christian König wrote:
> >>>>>>
> >>>>>> David also said that you considered sysfs but were wary of
> >>>>>> exposing process info in there. To clarify, my patch is not
> >>>>>> exposing sysfs entry per process, but one per open drm fd.
> >>>>>>
> >>>>>
> >>>>> Yes, we discussed this as well, but then rejected the approach.
> >>>>>
> >>>>> To have useful information related to the open drm fd you need to
> >>>>> related that to process(es) which have that file descriptor open.
> >>>>> Just tracking who opened it first like DRM does is pretty useless
> >>>>> on modern systems.
> >>>>
> >>>> We do update the pid/name for fds passed over unix sockets.
> >>>
> >>> Well I just double checked and that is not correct.
> >>>
> >>> Could be that i915 has some special code for that, but on my laptop I
> >>> only see the X server under the "clients" debugfs file.
> >>
> >> Yes we have special code in i915 for this. Part of this series we are
> >> discussing here.
> >
> > Ah, yeah you should mention that. Could we please separate that into
> > common code instead? Cause I really see that as a bug in the current
> > handling independent of the discussion here.
>
> What we do in i915 is update the pid and name when a task different to
> the one which opened the fd does a GEM context create ioctl.
>
> Moving that to DRM core would be along the lines of doing the same check
> and update on every ioctl. Maybe allow the update to be one time only if
> that would work. Would this be desirable and acceptable? If so I can
> definitely sketch it out.

If we go with fdinfo for these it becomes clear who all owns the file,
since it's then a per-process thing. Not sure how much smarts we
should have for internal debugfs output. Maybe one-shot update on
first driver ioctl (since if you're on render nodes then X does the
drm auth dance, so "first ioctl" is wrong).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 103+ messages in thread

end of thread, other threads:[~2021-06-28 14:37 UTC | newest]

Thread overview: 103+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-13 10:59 [PATCH 0/7] Per client engine busyness Tvrtko Ursulin
2021-05-13 10:59 ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 10:59 ` [PATCH 1/7] drm/i915: Expose list of clients in sysfs Tvrtko Ursulin
2021-05-13 10:59   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 10:59 ` [PATCH 2/7] drm/i915: Update client name on context create Tvrtko Ursulin
2021-05-13 10:59   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 10:59 ` [PATCH 3/7] drm/i915: Make GEM contexts track DRM clients Tvrtko Ursulin
2021-05-13 10:59   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 10:59 ` [PATCH 4/7] drm/i915: Track runtime spent in closed and unreachable GEM contexts Tvrtko Ursulin
2021-05-13 10:59   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 11:00 ` [PATCH 5/7] drm/i915: Track all user contexts per client Tvrtko Ursulin
2021-05-13 11:00   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 11:00 ` [PATCH 6/7] drm/i915: Track context current active time Tvrtko Ursulin
2021-05-13 11:00   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 11:00 ` [PATCH 7/7] drm/i915: Expose per-engine client busyness Tvrtko Ursulin
2021-05-13 11:00   ` [Intel-gfx] " Tvrtko Ursulin
2021-05-13 11:28 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for Per client engine busyness Patchwork
2021-05-13 11:30 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2021-05-13 11:59 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2021-05-13 15:48 ` [PATCH 0/7] " Alex Deucher
2021-05-13 15:48   ` [Intel-gfx] " Alex Deucher
2021-05-13 16:40   ` Tvrtko Ursulin
2021-05-13 16:40     ` [Intel-gfx] " Tvrtko Ursulin
2021-05-14  5:58     ` Alex Deucher
2021-05-14  5:58       ` [Intel-gfx] " Alex Deucher
2021-05-14  7:22       ` Nieto, David M
2021-05-14  7:22         ` [Intel-gfx] " Nieto, David M
2021-05-14  8:04         ` Christian König
2021-05-14  8:04           ` [Intel-gfx] " Christian König
2021-05-14 13:42           ` Tvrtko Ursulin
2021-05-14 13:42             ` [Intel-gfx] " Tvrtko Ursulin
2021-05-14 13:53             ` Christian König
2021-05-14 13:53               ` [Intel-gfx] " Christian König
2021-05-14 14:47               ` Tvrtko Ursulin
2021-05-14 14:47                 ` [Intel-gfx] " Tvrtko Ursulin
2021-05-14 14:56                 ` Christian König
2021-05-14 14:56                   ` [Intel-gfx] " Christian König
2021-05-14 15:03                   ` Tvrtko Ursulin
2021-05-14 15:03                     ` [Intel-gfx] " Tvrtko Ursulin
2021-05-14 15:10                     ` Christian König
2021-05-14 15:10                       ` [Intel-gfx] " Christian König
2021-05-17 14:30                       ` Daniel Vetter
2021-05-17 14:30                         ` [Intel-gfx] " Daniel Vetter
2021-05-17 14:39                         ` Nieto, David M
2021-05-17 14:39                           ` [Intel-gfx] " Nieto, David M
2021-05-17 16:00                           ` Tvrtko Ursulin
2021-05-17 16:00                             ` [Intel-gfx] " Tvrtko Ursulin
2021-05-17 18:02                             ` Nieto, David M
2021-05-17 18:02                               ` [Intel-gfx] " Nieto, David M
2021-05-17 18:16                               ` [Nouveau] " Nieto, David M
2021-05-17 18:16                                 ` [Intel-gfx] " Nieto, David M
2021-05-17 18:16                                 ` Nieto, David M
2021-05-17 19:03                                 ` [Nouveau] " Simon Ser
2021-05-17 19:03                                   ` [Intel-gfx] " Simon Ser
2021-05-17 19:03                                   ` Simon Ser
2021-05-18  9:08                                   ` [Nouveau] " Tvrtko Ursulin
2021-05-18  9:08                                     ` [Intel-gfx] " Tvrtko Ursulin
2021-05-18  9:08                                     ` Tvrtko Ursulin
2021-05-18  9:16                                     ` [Nouveau] " Daniel Stone
2021-05-18  9:16                                       ` [Intel-gfx] " Daniel Stone
2021-05-18  9:16                                       ` Daniel Stone
2021-05-18  9:40                                       ` [Nouveau] " Tvrtko Ursulin
2021-05-18  9:40                                         ` [Intel-gfx] " Tvrtko Ursulin
2021-05-18  9:40                                         ` Tvrtko Ursulin
2021-05-19 16:16                                         ` [Nouveau] " Tvrtko Ursulin
2021-05-19 16:16                                           ` [Intel-gfx] " Tvrtko Ursulin
2021-05-19 16:16                                           ` Tvrtko Ursulin
2021-05-19 18:23                                           ` [Nouveau] [Intel-gfx] " Daniel Vetter
2021-05-19 18:23                                             ` Daniel Vetter
2021-05-19 18:23                                             ` Daniel Vetter
2021-05-19 23:17                                             ` [Nouveau] " Nieto, David M
2021-05-19 23:17                                               ` Nieto, David M
2021-05-19 23:17                                               ` Nieto, David M
2021-05-20 14:11                                               ` [Nouveau] " Daniel Vetter
2021-05-20 14:11                                                 ` Daniel Vetter
2021-05-20 14:11                                                 ` Daniel Vetter
2021-05-20 14:12                                                 ` [Nouveau] " Christian König
2021-05-20 14:12                                                   ` Christian König
2021-05-20 14:12                                                   ` Christian König
2021-05-20 14:17                                                   ` [Nouveau] " arabek
2021-05-20 14:17                                                     ` [Intel-gfx] [Nouveau] " arabek
2021-05-20 14:17                                                     ` [Nouveau] [Intel-gfx] " arabek
2021-05-20  8:35                                             ` Tvrtko Ursulin
2021-05-20  8:35                                               ` Tvrtko Ursulin
2021-05-20  8:35                                               ` Tvrtko Ursulin
2021-05-24 10:48                                               ` [Nouveau] " Tvrtko Ursulin
2021-05-24 10:48                                                 ` Tvrtko Ursulin
2021-05-24 10:48                                                 ` Tvrtko Ursulin
2021-05-18  9:35                               ` Tvrtko Ursulin
2021-05-18  9:35                                 ` [Intel-gfx] " Tvrtko Ursulin
2021-05-18 12:06                                 ` Christian König
2021-05-18 12:06                                   ` [Intel-gfx] " Christian König
2021-05-17 19:16                         ` Christian König
2021-05-17 19:16                           ` [Intel-gfx] " Christian König
2021-06-28 10:16                       ` Tvrtko Ursulin
2021-06-28 10:16                         ` [Intel-gfx] " Tvrtko Ursulin
2021-06-28 14:37                         ` Daniel Vetter
2021-06-28 14:37                           ` [Intel-gfx] " Daniel Vetter
2021-05-15 10:40                     ` Maxime Schmitt
2021-05-17 16:13                       ` Tvrtko Ursulin
2021-05-17 14:20   ` Daniel Vetter
2021-05-17 14:20     ` [Intel-gfx] " Daniel Vetter
2021-05-13 16:38 ` [Intel-gfx] ✗ Fi.CI.IGT: failure for " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.