dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [RFC 0/6] Non perf based Gen Graphics OA unit driver
@ 2015-09-29 14:39 Robert Bragg
  2015-09-29 14:39 ` [RFC 1/6] drm/i915: Add i915 perf infrastructure Robert Bragg
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Robert Bragg @ 2015-09-29 14:39 UTC (permalink / raw)
  To: intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Daniel Vetter, Chris Wilson, Sourab Gupta, Zhenyu Wang,
	Jani Nikula, David Airlie, Peter Zijlstra, Ingo Molnar,
	Kan Liang, Alexander Shishkin, Zheng Yan, Mark Rutland,
	Matt Fleming, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

After some recent progress enabling the Observation Architecture unit
for Gen8+, we can hopefully paint a fairly complete picture of the
requirements for supporting the unit from Haswell to Skylake and so
I'm looking again at the challenges in upstreaming this work.

Considering this, it looked like it could be worthwhile experimenting
with a non-perf based driver for the OA unit and I'm hoping to explain
why and how it went as well as request some feedback on whether we
should aim to move forward without perf.

Besides the patches forwarded here, a branch can be found for reference
here:

  https://github.com/rib/linux - wip/rib/oa-without-perf branch

I created corresponding branches for Mesa and GPU Top to test this here
(same branch names):

  https://github.com/rib/mesa
  https://github.com/rib/gputop

Here I've only included the patches up to an initial Haswell driver,
although the wip/rib/oa-without-perf branch on github includes support
for Gen8+. Please let me know if it would be helpful to forward more.

At this point I have two drivers at feature parity; one based on perf,
one not. Technically they're very similar and the patches are split to
hopefully be quite comparable. My latest perf-based work is under
wip/rib/oa-next branches in the above repos.


So, these are the concerns I have a.t.m about upstreaming this work:
   

- We're bridging two complex architectures

    To review this work I think it will be relevant to have a good
    general familiarity with Gen graphics (e.g. thinking about the OA
    unit's interaction with the command streamer and execlist
    scheduling) as well as our userspace architecture and how we're
    consuming OA data within Mesa to implement the
    INTEL_performance_query extension.
    
    On the flip side here, its necessary to understand the perf
    userspace interface (for most this is hidden by tools so the details
    aren't common knowledge) as well as the internal design, considering
    that the PMU we're looking at seems to break several current design
    assumptions. I can only claim a limited familiarity with perf's
    design, just as a result of this work.


- Limited documentation for the OA unit:

    Not unique to the OA unit but I think having a driver that extends
    outside of the graphics stack, into the core perf infrastructure
    probably requires more comprehensive HW + graphics stack
    documentation for non drm/i915 developers.  Earlier RFC discussions
    were hampered somewhat by limited documentation.  Improved
    documentation is always desirable, but of course it can also take a
    significant amount of time and effort while some key aspects
    (notably the PRMs) aren't directly under my control.
 

- The current OA PMU driver breaks some significant design assumptions.

    Existing perf pmus are used for profiling work on a cpu and we're
    introducing the idea of _IS_DEVICE pmus with different security
    implications, the need to fake cpu-related data (such as user/kernel
    registers) to fit with perf's current design, and adding _DEVICE
    records as a way to forward device-specific status records.

    The OA unit writes reports of counters into a circular buffer,
    without involvement from the CPU, making our PMU driver the first of
    a kind.
    
    Given the way we periodically forward data from the OA buffer to
    perf's buffer, these bursts of sample writes look to perf like we're
    sampling too fast and so it throttles us.

    Perf supports groups of counters and allows those to be read via
    transactions internally but transactions currently seem designed to
    be explicitly initiated from the cpu (say in response to a userspace
    read()) and while we could pull a report out of the OA buffer we
    can't trigger a report from the cpu on demand.
    
    Related to being report based; the OA counters are configured in HW
    as a set while perf generally expects counter configurations to be
    orthogonal. Although counters can be associated with a group leader
    as they are opened, there's no clear precedent for being able to
    provide group-wide configuration attributes and no obvious solution
    as yet that's expected to be acceptable to upstream and meets our
    userspace needs. We currently avoid using perf's grouping feature
    and forward OA reports to userspace via perf's 'raw' sample field.
    This suits our userspace well considering how coupled the counters
    are when dealing with normalizing. It would be inconvenient to split
    counters up into separate events, only to require userspace to
    recombine them. For Mesa it's also convenient to be forwarded raw,
    periodic reports for combining with the raw reports it captures
    using MI_REPORT_PERF_COUNT commands.

    Related to counter orthogonality; we can't time share the OA unit,
    while event scheduling is a central design idea within perf for
    allowing userspace to open + enable more events than can be
    configured in HW at any one time. The OA unit is not designed to
    allow re-configuration while in use. We can't reconfigure the OA
    unit without loosing internal OA unit state which we can't access
    explicitly to save and restore. Reconfiguring the OA unit is also
    relatively slow, involving ~100 register writes. From userspace Mesa
    also depends on a stable OA configuration when emitting
    MI_REPORT_PERF_COUNT commands and importantly the OA unit can't be
    disabled while there are outstanding MI_RPC commands lest we hang
    the command streamer.


- We may be making some technical compromises a.t.m for the sake of
  using perf.

    perf_event_open() requires events to either relate to a pid or a
    specific cpu core, while our device pmu relates to neither.  Events
    opened with a pid will be automatically enabled/disabled according
    to the scheduling of that process - so not appropriate for us. When
    an event is related to a cpu id, perf ensures pmu methods will be
    invoked via an inter process interrupt on that core. To avoid
    invasive changes our userspace opens OA perf events for a specific
    cpu. This is workable but it means the majority of the OA driver now
    runs in atomic context, including all OA report forwarding, which
    isn't really necessary in our case and seems to make our locking
    requirements somewhat complex as we handle the interaction with the
    rest of the i915 driver.


- I'm not confident our use case benefits much from building on perf:

    We aren't using existing perf based tooling with our PMU. Existing
    tools typically assume you're profiling work running on a cpu, e.g.
    expecting samples to be associated with instruction pointers and
    user/kernel registers and aiming to represent metrics in relation
    to application source code. We're forwarding fake register values
    and userspace needs needs to know how to decode the raw OA reports
    before anything can be reported to a user.
    
    With the buffering done by the OA unit I don't think we currently
    benefit from perf's mmapped circular buffer interface. We already
    have a decoupled producer and consumer and since we have to copy out
    of the OA buffer, it would work well for us to hide that copy in
    a simpler read() based interface.


- Logistically it might be more practical to contain this to the
  graphics stack.

    It seems fair to consider that if we can't see a very compelling
    benefit to building on perf, then containing this work to
    drivers/gpu/drm/i915 may simplify the review process as well as
    future maintenance and development.



About the initial non-perf driver:

Structurally it's very similar to the perf based implementation. The
userspace interface is inspired by perf and adds a
DRM_IOCTL_I915_PERF_OPEN ioctl that's conceptually comparable to
perf_event_open() returning an fd that userspace can poll() and read()
samples from. The fds also supports I915_PERF_IOCTL_ENABLE/DISABLE
ioctls much like perf.

When opening an event; users specify an event type (enum based a.t.m
like perf's built in event types) and a pointer to an event-specific
attributes structure which is extensible in the same way as struct
i915_perf_event_attr. Metrics can optionally be opened for a specific
GPU context (comparable to passing a pid to perf_event_open()) and the
contents of samples can be controlled via a sample_flags member as with
perf.

Userspace collects samples via read() which writes (only complete)
records to the user's given buffer. Records have a type + size
header equivalent to struct i915_perf_event_header.

Updating Mesa and GPU Top to experiment with this was straightforward
given the similarity to the perf interface.  The main difference is that
it only supports forwarding metrics via read()s instead of an mmaped
circular buffer. As mentioned above, I think that suits this well, and
requires no additional copying of data. I think the userspace code has
ended up being a little simpler too.

Overall the driver currently isn't much more code than with perf (~200
lines).

Personally my gut feeling a.t.m, is that we should aim to move forward
independent from perf.

I'd really appreciate some feedback from others on this though.

Daniel and Chris; although I think it made sense at the outset to try
and use perf, in light of the above would you be open to a non-perf
based driver for the OA unit?

Peter; I wonder if you would tend to agree too that it could make sense
for us to go with our own interface here?


Kind Regards,
Robert

Robert Bragg (6):
  drm/i915: Add i915 perf infrastructure
  drm/i915: rename OACONTROL GEN7_OACONTROL
  drm/i915: Add static '3D' Haswell OA unit config
  drm/i915: Add i915 perf event for Haswell OA unit
  drm/i915: Add dev.i915.perf_event_paranoid sysctl option
  drm/i915: add oa_event_min_timer_exponent sysctl

 drivers/gpu/drm/i915/Makefile           |    4 +
 drivers/gpu/drm/i915/i915_cmd_parser.c  |    4 +-
 drivers/gpu/drm/i915/i915_dma.c         |    7 +
 drivers/gpu/drm/i915/i915_drv.h         |  136 ++++
 drivers/gpu/drm/i915/i915_gem_context.c |   23 +-
 drivers/gpu/drm/i915/i915_oa_hsw.c      |   98 +++
 drivers/gpu/drm/i915/i915_oa_hsw.h      |   36 +
 drivers/gpu/drm/i915/i915_perf.c        | 1201 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h         |  340 ++++++++-
 include/uapi/drm/i915_drm.h             |  125 ++++
 10 files changed, 1967 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

-- 
2.5.2

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [RFC 1/6] drm/i915: Add i915 perf infrastructure
  2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
@ 2015-09-29 14:39 ` Robert Bragg
  2015-09-29 14:39 ` [RFC 2/6] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-09-29 14:39 UTC (permalink / raw)
  To: intel-gfx
  Cc: Mark Rutland, Matt Fleming, David Airlie, dri-devel,
	linux-kernel, Peter Zijlstra, Sourab Gupta, linux-api, Zheng Yan,
	Daniel Vetter, Ingo Molnar, Alexander Shishkin

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl comparable to perf_event_open
that opens a file descriptor for an event source.

Based on our initial experience aiming to use the core perf
infrastructure, this interface is inspired by perf, but focused on
exposing metrics about work running on Gen graphics instead a CPU.

One notable difference is that it doesn't support mmaping a circular
buffer of samples into userspace. The currently planned use cases
require an internal buffering that forces at least one copy of data
which can be neatly hidden in a read() based interface.

No specific event types are supported yet so perf_event_open can currently
only get as far as returning EINVAL for an unknown event type.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/Makefile    |   3 +
 drivers/gpu/drm/i915/i915_dma.c  |   7 +
 drivers/gpu/drm/i915/i915_drv.h  |  74 +++++++
 drivers/gpu/drm/i915/i915_perf.c | 447 +++++++++++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h      |  62 ++++++
 5 files changed, 593 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 44d290a..5485495 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -93,6 +93,9 @@ i915-y += dvo_ch7017.o \
 # virtual gpu code
 i915-y += i915_vgpu.o
 
+# perf code
+i915-y += i915_perf.o
+
 # legacy horrors
 i915-y += i915_dma.o
 
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 2193cc2..0424e8c 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -841,6 +841,11 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	mutex_init(&dev_priv->modeset_restore_lock);
 	mutex_init(&dev_priv->csr_lock);
 
+	/* Must at least be initialized before trying to pin any context
+	 * which i915_perf hooks into.
+	 */
+	i915_perf_init(dev);
+
 	intel_pm_setup(dev);
 
 	intel_display_crc_init(dev);
@@ -1090,6 +1095,7 @@ int i915_driver_unload(struct drm_device *dev)
 		return ret;
 	}
 
+	i915_perf_fini(dev);
 	intel_power_domains_fini(dev_priv);
 
 	intel_gpu_ips_teardown();
@@ -1280,6 +1286,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };
 
 int i915_max_ioctl = ARRAY_SIZE(i915_ioctls);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e0f3f05..c16c9e5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1694,6 +1694,67 @@ struct i915_execbuffer_params {
 	struct drm_i915_gem_request     *request;
 };
 
+struct i915_perf_read_state {
+	int count;
+	ssize_t read;
+	char __user *buf;
+};
+
+struct i915_perf_event {
+	struct drm_i915_private *dev_priv;
+
+	struct list_head link;
+
+	u32 sample_flags;
+
+	struct intel_context *ctx;
+	bool enabled;
+
+	/* Enables the collection of HW events, either in response to
+	 * I915_PERF_IOCTL_ENABLE or implicitly called when event is
+	 * opened without I915_PERF_FLAG_DISABLED */
+	void (*enable)(struct i915_perf_event *event);
+
+	/* Disables the collection of HW events, either in response to
+	 * I915_PERF_IOCTL_DISABLE or implicitly called before
+	 * destroying the event. */
+	void (*disable)(struct i915_perf_event *event);
+
+	/* Return: true if any i915 perf records are ready to read()
+	 * for this event */
+	bool (*can_read)(struct i915_perf_event *event);
+
+	/* Call poll_wait, passing a wait queue that will be woken
+	 * once there is something to ready to read() for the event */
+	void (*poll_wait)(struct i915_perf_event *event,
+			  struct file *file,
+			  poll_table *wait);
+
+	/* For handling a blocking read, wait until there is something
+	 * to ready to read() for the event. E.g. wait on the same
+	 * wait queue that would be passed to poll_wait() until
+	 * ->can_read() returns true (if its safe to call ->can_read()
+	 * without the i915 perf lock held). */
+	int (*wait_unlocked)(struct i915_perf_event *event);
+
+	/* Copy as many buffered i915 perf samples and records for
+	 * this event to userspace as will fit in the given buffer.
+	 *
+	 * Only write complete records.
+	 *
+	 * read_state->count is the length of read_state->buf
+	 *
+	 * Update read_state->read with the number of bytes written.
+	 */
+	void (*read)(struct i915_perf_event *event,
+		     struct i915_perf_read_state *read_state);
+
+	/* Cleanup any event specific resources.
+	 *
+	 * The event will always be disabled before this is called */
+	void (*destroy)(struct i915_perf_event *event);
+};
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *objects;
@@ -1928,6 +1989,12 @@ struct drm_i915_private {
 
 	struct i915_runtime_pm pm;
 
+	struct {
+		bool initialized;
+		struct mutex lock;
+		struct list_head events;
+	} perf;
+
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
 	struct {
 		int (*execbuf_submit)(struct i915_execbuffer_params *params,
@@ -3130,6 +3197,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file_priv);
 
+int i915_perf_open_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file);
+
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  struct i915_address_space *vm,
@@ -3239,6 +3309,10 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master);
 
+/* i915_perf.c */
+extern void i915_perf_init(struct drm_device *dev);
+extern void i915_perf_fini(struct drm_device *dev);
+
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
 extern int i915_restore_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
new file mode 100644
index 0000000..477e3e6
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -0,0 +1,447 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/anon_inodes.h>
+#include <linux/sizes.h>
+
+#include "i915_drv.h"
+
+/**
+ * i915_perf_copy_attr() - copy specific event attributes from userspace
+ * @uattr:	The u64 __user attr of drm_i915_perf_open_param
+ * @attr:	Destination for copied attributes
+ * @v0_size:	The smallest, version 0 size of these attributes
+ * @real_size:	The latest size expected by this kernel version
+ *
+ * Specific events can define a custom attributes structure and for
+ * consistency should use this utility for reading the attributes from
+ * userspace.
+ *
+ * Note: although this verifies any unknown members beyond the expected
+ * struct size are zeroed it can't check for unused flags
+ *
+ * Return: 0 if successful, else an error code
+ */
+static int i915_perf_copy_attr(void __user *uattr,
+			       void *attr,
+			       u32 v0_size,
+			       u32 real_size)
+{
+	u32 size;
+	int ret;
+
+	if (!access_ok(VERIFY_WRITE, uattr, v0_size))
+		return -EFAULT;
+
+	/*
+	 * zero the full structure, so that a short copy will be nice.
+	 */
+	memset(attr, 0, real_size);
+
+	ret = get_user(size, (u32 __user *)uattr);
+	if (ret)
+		return ret;
+
+	if (size > PAGE_SIZE)   /* silly large */
+		goto err_size;
+
+	if (size < v0_size)
+		goto err_size;
+
+	/*
+	 * If we're handed a bigger struct than we know of,
+	 * ensure all the unknown bits are 0 - i.e. new
+	 * user-space does not rely on any kernel feature
+	 * extensions we dont know about yet.
+	 */
+
+	if (size > real_size) {
+		unsigned char __user *addr;
+		unsigned char __user *end;
+		unsigned char val;
+
+		addr = (void __user *)uattr + sizeof(*attr);
+		end  = (void __user *)uattr + size;
+
+		for (; addr < end; addr++) {
+			ret = get_user(val, addr);
+			if (ret)
+				return ret;
+			if (val)
+				goto err_size;
+		}
+		size = sizeof(*attr);
+	}
+
+	ret = copy_from_user(attr, uattr, size);
+	if (ret)
+		return -EFAULT;
+
+out:
+	return ret;
+
+err_size:
+	put_user(real_size, (u32 __user *)uattr);
+	ret = -E2BIG;
+	goto out;
+}
+
+static ssize_t i915_perf_read_locked(struct i915_perf_event *event,
+				     struct file *file,
+				     char __user *buf,
+				     size_t count,
+				     loff_t *ppos)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	struct i915_perf_read_state state = { count, 0, buf };
+	int ret;
+
+	if (file->f_flags & O_NONBLOCK) {
+		if (!event->can_read(event))
+			return -EAGAIN;
+	} else {
+		mutex_unlock(&dev_priv->perf.lock);
+		ret = event->wait_unlocked(event);
+		mutex_lock(&dev_priv->perf.lock);
+
+		if (ret)
+			return ret;
+	}
+
+	event->read(event, &state);
+	if (state.read == 0)
+		return -ENOSPC;
+
+	return state.read;
+}
+
+static ssize_t i915_perf_read(struct file *file,
+			      char __user *buf,
+			      size_t count,
+			      loff_t *ppos)
+{
+	struct i915_perf_event *event = file->private_data;
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	ssize_t ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_read_locked(event, file, buf, count, ppos);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+static unsigned int i915_perf_poll_locked(struct i915_perf_event *event,
+					  struct file *file,
+					  poll_table *wait)
+{
+	unsigned int events = 0;
+
+	event->poll_wait(event, file, wait);
+
+	if (event->can_read(event))
+		events |= POLLIN;
+
+	return events;
+}
+
+static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
+{
+	struct i915_perf_event *event = file->private_data;
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	int ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_poll_locked(event, file, wait);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+static void i915_perf_enable_locked(struct i915_perf_event *event)
+{
+	if (event->enabled)
+		return;
+
+	/* Allow event->enable() to refer to this */
+	event->enabled = true;
+
+	if (event->enable)
+		event->enable(event);
+}
+
+static void i915_perf_disable_locked(struct i915_perf_event *event)
+{
+	if (!event->enabled)
+		return;
+
+	/* Allow event->disable() to refer to this */
+	event->enabled = false;
+
+	if (event->disable)
+		event->disable(event);
+}
+
+static long i915_perf_ioctl_locked(struct i915_perf_event *event,
+				   unsigned int cmd,
+				   unsigned long arg)
+{
+	switch (cmd) {
+	case I915_PERF_IOCTL_ENABLE:
+		i915_perf_enable_locked(event);
+		return 0;
+	case I915_PERF_IOCTL_DISABLE:
+		i915_perf_disable_locked(event);
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static long i915_perf_ioctl(struct file *file,
+			    unsigned int cmd,
+			    unsigned long arg)
+{
+	struct i915_perf_event *event = file->private_data;
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	long ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_ioctl_locked(event, cmd, arg);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+static void i915_perf_destroy_locked(struct i915_perf_event *event)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	if (event->enabled)
+		i915_perf_disable_locked(event);
+
+	if (event->destroy)
+		event->destroy(event);
+
+	list_del(&event->link);
+
+	if (event->ctx) {
+		mutex_lock(&dev_priv->dev->struct_mutex);
+		i915_gem_context_unreference(event->ctx);
+		mutex_unlock(&dev_priv->dev->struct_mutex);
+	}
+
+	kfree(event);
+}
+
+static int i915_perf_release(struct inode *inode, struct file *file)
+{
+	struct i915_perf_event *event = file->private_data;
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	mutex_lock(&dev_priv->perf.lock);
+	i915_perf_destroy_locked(event);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return 0;
+}
+
+
+static const struct file_operations fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= no_llseek,
+	.release	= i915_perf_release,
+	.poll		= i915_perf_poll,
+	.read		= i915_perf_read,
+	.unlocked_ioctl	= i915_perf_ioctl,
+};
+
+static struct intel_context *
+lookup_context(struct drm_i915_private *dev_priv,
+	       struct file *user_filp,
+	       u32 ctx_user_handle)
+{
+	struct intel_context *ctx;
+
+	mutex_lock(&dev_priv->dev->struct_mutex);
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		struct drm_file *drm_file;
+
+		if (!ctx->file_priv)
+			continue;
+
+		drm_file = ctx->file_priv->file;
+
+		if (user_filp->private_data == drm_file &&
+		    ctx->user_handle == ctx_user_handle) {
+			i915_gem_context_reference(ctx);
+			mutex_unlock(&dev_priv->dev->struct_mutex);
+
+			return ctx;
+		}
+	}
+	mutex_unlock(&dev_priv->dev->struct_mutex);
+
+	return NULL;
+}
+
+int i915_perf_open_ioctl_locked(struct drm_device *dev, void *data,
+				struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_perf_open_param *param = data;
+	u32 known_open_flags = 0;
+	u64 known_sample_flags = 0;
+	struct intel_context *specific_ctx = NULL;
+	struct i915_perf_event *event = NULL;
+	unsigned long f_flags = 0;
+	int event_fd;
+	int ret = 0;
+
+	known_open_flags = I915_PERF_FLAG_FD_CLOEXEC |
+			   I915_PERF_FLAG_FD_NONBLOCK |
+			   I915_PERF_FLAG_SINGLE_CONTEXT |
+			   I915_PERF_FLAG_DISABLED;
+	if (param->flags & ~known_open_flags) {
+		DRM_ERROR("Unknown drm_i915_perf_open_param flag\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	known_sample_flags = I915_PERF_SAMPLE_OA_REPORT |
+			     I915_PERF_SAMPLE_CTXID |
+			     I915_PERF_SAMPLE_TIMESTAMP;
+	if (param->sample_flags & ~known_sample_flags) {
+		DRM_ERROR("Unknown drm_i915_perf_open_param sample_flag\n");
+		ret = -EINVAL;
+		goto err;
+	}
+
+	if (param->flags & I915_PERF_FLAG_SINGLE_CONTEXT) {
+		u32 ctx_id = param->ctx_id;
+
+		specific_ctx = lookup_context(dev_priv, file->filp, ctx_id);
+		if (!specific_ctx) {
+			DRM_ERROR("Failed to look up context with ID %u for opening perf event\n", ctx_id);
+			ret = -EINVAL;
+			goto err;
+		}
+	}
+
+	if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+		DRM_ERROR("Insufficient privileges to open perf event\n");
+		ret = -EACCES;
+		goto err_ctx;
+	}
+
+	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	if (!event) {
+		ret = -ENOMEM;
+		goto err_ctx;
+	}
+
+	event->sample_flags = param->sample_flags;
+	event->dev_priv = dev_priv;
+	event->ctx = specific_ctx;
+
+	switch (param->type) {
+		/* TODO: Init according to specific type */
+	default:
+		DRM_ERROR("Unknown perf event type\n");
+		ret = -EINVAL;
+		goto err_alloc;
+	}
+
+	event->ctx = specific_ctx;
+	list_add(&event->link, &dev_priv->perf.events);
+
+	if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
+		f_flags |= O_CLOEXEC;
+	if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)
+		f_flags |= O_NONBLOCK;
+
+	event_fd = anon_inode_getfd("[i915_perf]", &fops, event, f_flags);
+	if (event_fd < 0) {
+		ret = event_fd;
+		goto err_open;
+	}
+
+	param->fd = event_fd;
+
+	if (!(param->flags & I915_PERF_FLAG_DISABLED))
+		i915_perf_enable_locked(event);
+
+	return 0;
+
+err_open:
+	list_del(&event->link);
+	if (event->destroy)
+		event->destroy(event);
+err_alloc:
+	kfree(event);
+err_ctx:
+	if (specific_ctx) {
+		mutex_lock(&dev_priv->dev->struct_mutex);
+		i915_gem_context_unreference(specific_ctx);
+		mutex_unlock(&dev_priv->dev->struct_mutex);
+	}
+err:
+	param->fd = -1;
+
+	return ret;
+}
+
+int i915_perf_open_ioctl(struct drm_device *dev, void *data,
+			    struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	int ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_open_ioctl_locked(dev, data, file);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+void i915_perf_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+
+	/* Currently no global event state to initialize */
+
+	dev_priv->perf.initialized = true;
+}
+
+void i915_perf_fini(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+
+	if (!dev_priv->perf.initialized)
+		return;
+
+	/* Currently nothing to clean up */
+
+	dev_priv->perf.initialized = false;
+}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index dbd16a2..a84f71f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -230,6 +230,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_USERPTR		0x33
 #define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
 #define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
+#define DRM_I915_PERF_OPEN		0x36
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -283,6 +284,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_USERPTR			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
 #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
+#define DRM_IOCTL_I915_PERF_OPEN	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1129,4 +1131,64 @@ struct drm_i915_gem_context_param {
 	__u64 value;
 };
 
+enum drm_i915_perf_event_type {
+	I915_PERF_EVENT_TYPE_MAX	/* non-ABI */
+};
+
+#define I915_PERF_FLAG_FD_CLOEXEC	(1<<0)
+#define I915_PERF_FLAG_FD_NONBLOCK	(1<<1)
+#define I915_PERF_FLAG_SINGLE_CONTEXT	(1<<2)
+#define I915_PERF_FLAG_DISABLED         (1<<3)
+
+#define I915_PERF_SAMPLE_OA_REPORT	(1<<0)
+#define I915_PERF_SAMPLE_CTXID		(1<<1)
+#define I915_PERF_SAMPLE_TIMESTAMP	(1<<2)
+
+struct drm_i915_perf_open_param {
+	/* Such as I915_PERF_OA_EVENT */
+	__u32 type;
+
+	/* CLOEXEC, NONBLOCK, SINGLE_CONTEXT, PERIODIC... */
+	__u32 flags;
+
+	/* What to include in samples */
+	__u64 sample_flags;
+
+	/* A specific context to profile */
+	__u32 ctx_id;
+
+	/* Event specific attributes */
+	__u64 __user attr;
+
+	/* OUT */
+	__u32 fd;
+};
+
+#define I915_PERF_IOCTL_ENABLE	_IO('i', 0x0)
+#define I915_PERF_IOCTL_DISABLE	_IO('i', 0x1)
+
+/* Note: same as struct perf_event_header */
+struct drm_i915_perf_event_header {
+	__u32 type;
+	__u16 misc;
+	__u16 size;
+};
+
+enum drm_i915_perf_record_type {
+
+	/*
+	 * struct {
+	 *     struct drm_i915_perf_event_header header;
+	 *
+	 *     { u32 ctx_id; }	    && I915_PERF_SAMPLE_CTXID
+	 *     { u32 timestamp; }   && I915_PERF_SAMPLE_TIMESTAMP
+	 *     { u32 oa_report[]; } && I915_PERF_SAMPLE_OA_REPORT
+	 *
+	 * };
+	 */
+	DRM_I915_PERF_RECORD_SAMPLE = 1,
+
+	DRM_I915_PERF_RECORD_MAX /* non-ABI */
+};
+
 #endif /* _UAPI_I915_DRM_H_ */
-- 
2.5.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 2/6] drm/i915: rename OACONTROL GEN7_OACONTROL
  2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
  2015-09-29 14:39 ` [RFC 1/6] drm/i915: Add i915 perf infrastructure Robert Bragg
@ 2015-09-29 14:39 ` Robert Bragg
  2015-09-29 14:39 ` [RFC 3/6] drm/i915: Add static '3D' Haswell OA unit config Robert Bragg
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-09-29 14:39 UTC (permalink / raw)
  To: intel-gfx
  Cc: Mark Rutland, Matt Fleming, David Airlie, dri-devel,
	linux-kernel, Peter Zijlstra, Sourab Gupta, linux-api, Zheng Yan,
	Daniel Vetter, Ingo Molnar, Alexander Shishkin

OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h        | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 237ff68..d769436 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -439,7 +439,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = {
 	REG64(CL_PRIMITIVES_COUNT),
 	REG64(PS_INVOCATION_COUNT),
 	REG64(PS_DEPTH_COUNT),
-	REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */
+	REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
 	REG64(MI_PREDICATE_SRC0),
 	REG64(MI_PREDICATE_SRC1),
 	REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1020,7 +1020,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 			 * to the register. Hence, limit OACONTROL writes to
 			 * only MI_LOAD_REGISTER_IMM commands.
 			 */
-			if (reg_addr == OACONTROL) {
+			if (reg_addr == GEN7_OACONTROL) {
 				if (desc->cmd.value == MI_LOAD_REGISTER_MEM(1)) {
 					DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n");
 					return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 1fa0554..2e488e8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -536,7 +536,7 @@
 #define GEN7_3DPRIM_START_INSTANCE      0x243C
 #define GEN7_3DPRIM_BASE_VERTEX         0x2440
 
-#define OACONTROL 0x2360
+#define GEN7_OACONTROL 0x2360
 
 #define _GEN7_PIPEA_DE_LOAD_SL	0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL	0x71068
-- 
2.5.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 3/6] drm/i915: Add static '3D' Haswell OA unit config
  2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
  2015-09-29 14:39 ` [RFC 1/6] drm/i915: Add i915 perf infrastructure Robert Bragg
  2015-09-29 14:39 ` [RFC 2/6] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
@ 2015-09-29 14:39 ` Robert Bragg
  2015-09-29 14:39 ` [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-09-29 14:39 UTC (permalink / raw)
  To: intel-gfx
  Cc: Mark Rutland, Matt Fleming, David Airlie, dri-devel,
	linux-kernel, Peter Zijlstra, Sourab Gupta, linux-api, Zheng Yan,
	Daniel Vetter, Ingo Molnar, Alexander Shishkin

Adds a static OA unit, MUX + B Counter configuration for basic '3D'
metrics on Haswell. This is autogenerated from an internal XML
description of metric sets.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/Makefile      |  3 +-
 drivers/gpu/drm/i915/i915_drv.h    |  5 ++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 98 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_oa_hsw.h | 36 ++++++++++++++
 4 files changed, 141 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 5485495..5b1c688 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -94,7 +94,8 @@ i915-y += dvo_ch7017.o \
 i915-y += i915_vgpu.o
 
 # perf code
-i915-y += i915_perf.o
+i915-y += i915_perf.o \
+	  i915_oa_hsw.o
 
 # legacy horrors
 i915-y += i915_dma.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c16c9e5..0cb36d9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1694,6 +1694,11 @@ struct i915_execbuffer_params {
 	struct drm_i915_gem_request     *request;
 };
 
+struct i915_oa_reg {
+	u32 addr;
+	u32 value;
+};
+
 struct i915_perf_read_state {
 	int count;
 	ssize_t read;
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c
new file mode 100644
index 0000000..187bade
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -0,0 +1,98 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+const struct i915_oa_reg i915_oa_3d_b_counter_config_hsw[] = {
+	{ 0x2724, 0x00800000 },
+	{ 0x2720, 0x00000000 },
+	{ 0x2714, 0x00800000 },
+	{ 0x2710, 0x00000000 },
+};
+const int i915_oa_3d_b_counter_config_hsw_len = 4;
+
+const struct i915_oa_reg i915_oa_3d_mux_config_hsw[] = {
+	{ 0x253A4, 0x01600000 },
+	{ 0x25440, 0x00100000 },
+	{ 0x25128, 0x00000000 },
+	{ 0x2691C, 0x00000800 },
+	{ 0x26AA0, 0x01500000 },
+	{ 0x26B9C, 0x00006000 },
+	{ 0x2791C, 0x00000800 },
+	{ 0x27AA0, 0x01500000 },
+	{ 0x27B9C, 0x00006000 },
+	{ 0x2641C, 0x00000400 },
+	{ 0x25380, 0x00000010 },
+	{ 0x2538C, 0x00000000 },
+	{ 0x25384, 0x0800AAAA },
+	{ 0x25400, 0x00000004 },
+	{ 0x2540C, 0x06029000 },
+	{ 0x25410, 0x00000002 },
+	{ 0x25404, 0x5C30FFFF },
+	{ 0x25100, 0x00000016 },
+	{ 0x25110, 0x00000400 },
+	{ 0x25104, 0x00000000 },
+	{ 0x26804, 0x00001211 },
+	{ 0x26884, 0x00000100 },
+	{ 0x26900, 0x00000002 },
+	{ 0x26908, 0x00700000 },
+	{ 0x26904, 0x00000000 },
+	{ 0x26984, 0x00001022 },
+	{ 0x26A04, 0x00000011 },
+	{ 0x26A80, 0x00000006 },
+	{ 0x26A88, 0x00000C02 },
+	{ 0x26A84, 0x00000000 },
+	{ 0x26B04, 0x00001000 },
+	{ 0x26B80, 0x00000002 },
+	{ 0x26B8C, 0x00000007 },
+	{ 0x26B84, 0x00000000 },
+	{ 0x27804, 0x00004844 },
+	{ 0x27884, 0x00000400 },
+	{ 0x27900, 0x00000002 },
+	{ 0x27908, 0x0E000000 },
+	{ 0x27904, 0x00000000 },
+	{ 0x27984, 0x00004088 },
+	{ 0x27A04, 0x00000044 },
+	{ 0x27A80, 0x00000006 },
+	{ 0x27A88, 0x00018040 },
+	{ 0x27A84, 0x00000000 },
+	{ 0x27B04, 0x00004000 },
+	{ 0x27B80, 0x00000002 },
+	{ 0x27B8C, 0x000000E0 },
+	{ 0x27B84, 0x00000000 },
+	{ 0x26104, 0x00002222 },
+	{ 0x26184, 0x0C006666 },
+	{ 0x26284, 0x04000000 },
+	{ 0x26304, 0x04000000 },
+	{ 0x26400, 0x00000002 },
+	{ 0x26410, 0x000000A0 },
+	{ 0x26404, 0x00000000 },
+	{ 0x25420, 0x04108020 },
+	{ 0x25424, 0x1284A420 },
+	{ 0x2541C, 0x00000000 },
+	{ 0x25428, 0x00042049 },
+};
+const int i915_oa_3d_mux_config_hsw_len = 59;
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h
new file mode 100644
index 0000000..e170e4d
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.h
@@ -0,0 +1,36 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __I915_OA_HSW_H__
+#define __I915_OA_HSW_H__
+
+/* HSW Render Metrics Basic Gen7.5 */
+extern const struct i915_oa_reg i915_oa_3d_b_counter_config_hsw[];
+extern const int i915_oa_3d_b_counter_config_hsw_len;
+extern const struct i915_oa_reg i915_oa_3d_mux_config_hsw[];
+extern const int i915_oa_3d_mux_config_hsw_len;
+
+#endif
-- 
2.5.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit
  2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
                   ` (2 preceding siblings ...)
  2015-09-29 14:39 ` [RFC 3/6] drm/i915: Add static '3D' Haswell OA unit config Robert Bragg
@ 2015-09-29 14:39 ` Robert Bragg
  2015-09-29 14:55   ` [Intel-gfx] " kbuild test robot
  2015-09-29 14:39 ` [RFC 6/6] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 18+ messages in thread
From: Robert Bragg @ 2015-09-29 14:39 UTC (permalink / raw)
  To: intel-gfx
  Cc: Mark Rutland, Matt Fleming, David Airlie, dri-devel,
	linux-kernel, Peter Zijlstra, Sourab Gupta, linux-api, Zheng Yan,
	Daniel Vetter, Ingo Molnar, Alexander Shishkin

Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  57 +++
 drivers/gpu/drm/i915/i915_gem_context.c |  23 +-
 drivers/gpu/drm/i915/i915_perf.c        | 697 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h         | 338 ++++++++++++++++
 include/uapi/drm/i915_drm.h             |  63 +++
 5 files changed, 1171 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0cb36d9..d6db816 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1694,6 +1694,11 @@ struct i915_execbuffer_params {
 	struct drm_i915_gem_request     *request;
 };
 
+struct i915_oa_format {
+	u32 format;
+	int size;
+};
+
 struct i915_oa_reg {
 	u32 addr;
 	u32 value;
@@ -1760,6 +1765,20 @@ struct i915_perf_event {
 	void (*destroy)(struct i915_perf_event *event);
 };
 
+struct i915_oa_ops {
+	void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+	void (*enable_metric_set)(struct drm_i915_private *dev_priv);
+	void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+	void (*oa_enable)(struct drm_i915_private *dev_priv);
+	void (*oa_disable)(struct drm_i915_private *dev_priv);
+	void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+	void (*update_specific_hw_ctx_id)(struct drm_i915_private *dev_priv,
+					  u32 ctx_id);
+	void (*read)(struct i915_perf_event *event,
+		     struct i915_perf_read_state *read_state);
+	bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
+};
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *objects;
@@ -1996,7 +2015,43 @@ struct drm_i915_private {
 
 	struct {
 		bool initialized;
+
 		struct mutex lock;
+
+		struct ctl_table_header *sysctl_header;
+
+		struct {
+			struct i915_perf_event *exclusive_event;
+
+			u32 specific_ctx_id;
+
+			struct hrtimer poll_check_timer;
+			wait_queue_head_t poll_wq;
+
+			bool periodic;
+			u32 period_exponent;
+
+			u32 metrics_set;
+
+			const struct i915_oa_reg *mux_regs;
+			int mux_regs_len;
+			const struct i915_oa_reg *b_counter_regs;
+			int b_counter_regs_len;
+
+			struct {
+				struct drm_i915_gem_object *obj;
+				u32 gtt_offset;
+				u8 *addr;
+				u32 head;
+				u32 tail;
+				int format;
+				int format_size;
+			} oa_buffer;
+
+			struct i915_oa_ops ops;
+			const struct i915_oa_format *oa_formats;
+		} oa;
+
 		struct list_head events;
 	} perf;
 
@@ -3204,6 +3259,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 
 int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file);
+void i915_oa_context_pin_notify(struct drm_i915_private *dev_priv,
+				struct intel_context *context);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8e893b3..3c4419c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -133,6 +133,23 @@ static int get_context_size(struct drm_device *dev)
 	return ret;
 }
 
+static int i915_gem_context_pin_state(struct drm_device *dev,
+				      struct intel_context *ctx)
+{
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
+				    get_context_alignment(dev), 0);
+	if (ret)
+		return ret;
+
+	i915_oa_context_pin_notify(dev->dev_private, ctx);
+
+	return 0;
+}
+
 void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct intel_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
@@ -258,8 +275,7 @@ i915_gem_create_context(struct drm_device *dev,
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(dev), 0);
+		ret = i915_gem_context_pin_state(dev, ctx);
 		if (ret) {
 			DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret);
 			goto err_destroy;
@@ -634,8 +650,7 @@ static int do_switch(struct drm_i915_gem_request *req)
 
 	/* Trying to pin first makes error handling easier. */
 	if (ring == &dev_priv->ring[RCS]) {
-		ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(ring->dev), 0);
+		ret = i915_gem_context_pin_state(ring->dev, to);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 477e3e6..bc1c4d1 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -26,6 +26,31 @@
 #include <linux/sizes.h>
 
 #include "i915_drv.h"
+#include "intel_ringbuffer.h"
+#include "intel_lrc.h"
+#include "i915_oa_hsw.h"
+
+/* Must be a power of two */
+#define OA_BUFFER_SIZE	     SZ_16M
+#define OA_TAKEN(tail, head) ((tail - head) & (OA_BUFFER_SIZE - 1))
+
+/* frequency for forwarding samples from OA to perf buffer */
+#define POLL_FREQUENCY 200
+#define POLL_PERIOD max_t(u64, 10000, NSEC_PER_SEC / POLL_FREQUENCY)
+
+#define OA_EXPONENT_MAX 0x3f
+
+static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
+	[I915_OA_FORMAT_A13]	    = { 0, 64 },
+	[I915_OA_FORMAT_A29]	    = { 1, 128 },
+	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
+	/* A29_B8_C8 Disallowed as 192 bytes doesn't factor into buffer size */
+	[I915_OA_FORMAT_B4_C8]	    = { 4, 64 },
+	[I915_OA_FORMAT_A45_B8_C8]  = { 5, 256 },
+	[I915_OA_FORMAT_B4_C8_A16]  = { 6, 128 },
+	[I915_OA_FORMAT_C4_B8]	    = { 7, 64 },
+};
+
 
 /**
  * i915_perf_copy_attr() - copy specific event attributes from userspace
@@ -107,6 +132,634 @@ err_size:
 	goto out;
 }
 
+
+static bool gen7_oa_buffer_is_empty(struct drm_i915_private *dev_priv)
+{
+	u32 oastatus2 = I915_READ(GEN7_OASTATUS2);
+	u32 oastatus1 = I915_READ(GEN7_OASTATUS1);
+	u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+	u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
+
+	return OA_TAKEN(tail, head) == 0;
+}
+
+static bool append_oa_status(struct i915_perf_event *event,
+			     struct i915_perf_read_state *read_state,
+			     enum drm_i915_perf_record_type type)
+{
+	struct drm_i915_perf_event_header header = { type, 0, sizeof(header) };
+
+	if ((read_state->count - read_state->read) < header.size)
+		return false;
+
+	copy_to_user(read_state->buf, &header, sizeof(header));
+
+	read_state->buf += sizeof(header);
+	read_state->read += header.size;
+
+	return true;
+}
+
+static bool append_oa_sample(struct i915_perf_event *event,
+			     struct i915_perf_read_state *read_state,
+			     const u8 *report)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
+	struct drm_i915_perf_event_header header;
+	u32 sample_flags = event->sample_flags;
+	u32 dummy_ctx_id = 0;
+	u32 dummy_timestamp = 0;
+
+	header.type = DRM_I915_PERF_RECORD_SAMPLE;
+	header.misc = 0;
+	header.size = sizeof(header);
+
+
+	/* XXX: could pre-compute this when opening the event... */
+
+	if (sample_flags & I915_PERF_SAMPLE_CTXID)
+		header.size += 4;
+
+	if (sample_flags & I915_PERF_SAMPLE_TIMESTAMP)
+		header.size += 4;
+
+	if (sample_flags & I915_PERF_SAMPLE_OA_REPORT)
+		header.size += report_size;
+
+
+	if ((read_state->count - read_state->read) < header.size)
+		return false;
+
+
+	copy_to_user(read_state->buf, &header, sizeof(header));
+	read_state->buf += sizeof(header);
+
+	if (sample_flags & I915_PERF_SAMPLE_CTXID) {
+#warning "fixme: extract context ID from OA reports"
+		copy_to_user(read_state->buf, &dummy_ctx_id, 4);
+		read_state->buf += 4;
+	}
+
+	if (sample_flags & I915_PERF_SAMPLE_TIMESTAMP) {
+#warning "fixme: extract timestamp from OA reports"
+		copy_to_user(read_state->buf, &dummy_timestamp, 4);
+		read_state->buf += 4;
+	}
+
+	if (sample_flags & I915_PERF_SAMPLE_OA_REPORT) {
+		copy_to_user(read_state->buf, report, report_size);
+		read_state->buf += report_size;
+	}
+
+
+	read_state->read += header.size;
+
+	return true;
+}
+
+static u32 gen7_append_oa_reports(struct i915_perf_event *event,
+				  struct i915_perf_read_state *read_state,
+				  u32 head,
+				  u32 tail)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
+	u8 *oa_buf_base = dev_priv->perf.oa.oa_buffer.addr;
+	u32 mask = (OA_BUFFER_SIZE - 1);
+	u8 *report;
+	u32 taken;
+
+	head -= dev_priv->perf.oa.oa_buffer.gtt_offset;
+	tail -= dev_priv->perf.oa.oa_buffer.gtt_offset;
+
+	/* Note: the gpu doesn't wrap the tail according to the OA buffer size
+	 * so when we need to make sure our head/tail values are in-bounds we
+	 * use the above mask.
+	 */
+
+	while ((taken = OA_TAKEN(tail, head))) {
+		/* The tail increases in 64 byte increments, not in
+		 * format_size steps. */
+		if (taken < report_size)
+			break;
+
+		report = oa_buf_base + (head & mask);
+
+		if (dev_priv->perf.oa.exclusive_event->enabled) {
+			if (!append_oa_sample(event, read_state, report))
+				break;
+		}
+
+		/* If append_oa_sample() returns false we shouldn't progress
+		 * head so we update it afterwards... */
+		head += report_size;
+	}
+
+	return dev_priv->perf.oa.oa_buffer.gtt_offset + head;
+}
+
+static void gen7_oa_read(struct i915_perf_event *event,
+			 struct i915_perf_read_state *read_state)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	u32 oastatus2;
+	u32 oastatus1;
+	u32 head;
+	u32 tail;
+
+	WARN_ON(!dev_priv->perf.oa.oa_buffer.addr);
+
+	oastatus2 = I915_READ(GEN7_OASTATUS2);
+	oastatus1 = I915_READ(GEN7_OASTATUS1);
+
+	head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+	tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
+
+	if (unlikely(oastatus1 & (GEN7_OASTATUS1_OABUFFER_OVERFLOW |
+				  GEN7_OASTATUS1_REPORT_LOST))) {
+
+		if (oastatus1 & GEN7_OASTATUS1_OABUFFER_OVERFLOW) {
+			if (append_oa_status(event, read_state,
+					     DRM_I915_PERF_RECORD_OA_BUFFER_OVERFLOW))
+				oastatus1 &= ~GEN7_OASTATUS1_OABUFFER_OVERFLOW;
+		}
+
+		if (oastatus1 & GEN7_OASTATUS1_REPORT_LOST) {
+			if (append_oa_status(event, read_state,
+					     DRM_I915_PERF_RECORD_OA_REPORT_LOST))
+				oastatus1 &= ~GEN7_OASTATUS1_REPORT_LOST;
+		}
+
+		I915_WRITE(GEN7_OASTATUS1, oastatus1);
+	}
+
+	head = gen7_append_oa_reports(event, read_state, head, tail);
+
+	I915_WRITE(GEN7_OASTATUS2, (head & GEN7_OASTATUS2_HEAD_MASK) |
+				    OA_MEM_SELECT_GGTT);
+}
+
+static bool i915_oa_can_read(struct i915_perf_event *event)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	return !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv);
+}
+
+static int i915_oa_wait_unlocked(struct i915_perf_event *event)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	/* Note: the oa_buffer_is_empty() condition is ok to run unlocked as it
+	 * just performs mmio reads of the OA buffer head + tail pointers and
+	 * it's assumed we're handling some operation that implies the event
+	 * can't be destroyed until completion (such as a read()) that ensures
+	 * the device + OA buffer can't disappear
+	 */
+	return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
+					!dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv));
+}
+
+static void i915_oa_poll_wait(struct i915_perf_event *event,
+			      struct file *file,
+			      poll_table *wait)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
+}
+
+static void i915_oa_read(struct i915_perf_event *event,
+			 struct i915_perf_read_state *read_state)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	dev_priv->perf.oa.ops.read(event, read_state);
+}
+
+static void
+free_oa_buffer(struct drm_i915_private *i915)
+{
+	mutex_lock(&i915->dev->struct_mutex);
+
+	vunmap(i915->perf.oa.oa_buffer.addr);
+	i915_gem_object_ggtt_unpin(i915->perf.oa.oa_buffer.obj);
+	drm_gem_object_unreference(&i915->perf.oa.oa_buffer.obj->base);
+
+	i915->perf.oa.oa_buffer.obj = NULL;
+	i915->perf.oa.oa_buffer.gtt_offset = 0;
+	i915->perf.oa.oa_buffer.addr = NULL;
+
+	mutex_unlock(&i915->dev->struct_mutex);
+}
+
+static void i915_oa_event_destroy(struct i915_perf_event *event)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	BUG_ON(event != dev_priv->perf.oa.exclusive_event);
+
+	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
+
+	free_oa_buffer(dev_priv);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+	intel_runtime_pm_put(dev_priv);
+
+	dev_priv->perf.oa.exclusive_event = NULL;
+}
+
+static void *vmap_oa_buffer(struct drm_i915_gem_object *obj)
+{
+	int i;
+	void *addr = NULL;
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL) {
+		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
+		goto finish;
+	}
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
+		pages[i] = sg_page_iter_page(&sg_iter);
+		i++;
+	}
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	if (addr == NULL) {
+		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
+		goto finish;
+	}
+
+finish:
+	if (pages)
+		drm_free_large(pages);
+	return addr;
+}
+
+static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
+{
+	/* Pre-DevBDW: OABUFFER must be set with counters off,
+	 * before OASTATUS1, but after OASTATUS2 */
+	I915_WRITE(GEN7_OASTATUS2, dev_priv->perf.oa.oa_buffer.gtt_offset |
+		   OA_MEM_SELECT_GGTT); /* head */
+	I915_WRITE(GEN7_OABUFFER, dev_priv->perf.oa.oa_buffer.gtt_offset);
+	I915_WRITE(GEN7_OASTATUS1, dev_priv->perf.oa.oa_buffer.gtt_offset |
+		   OABUFFER_SIZE_16M); /* tail */
+}
+
+static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
+{
+	struct drm_i915_gem_object *bo;
+	int ret;
+
+	BUG_ON(dev_priv->perf.oa.oa_buffer.obj);
+
+	ret = i915_mutex_lock_interruptible(dev_priv->dev);
+	if (ret)
+		return ret;
+
+	bo = i915_gem_alloc_object(dev_priv->dev, OA_BUFFER_SIZE);
+	if (bo == NULL) {
+		DRM_ERROR("Failed to allocate OA buffer\n");
+		ret = -ENOMEM;
+		goto unlock;
+	}
+	dev_priv->perf.oa.oa_buffer.obj = bo;
+
+	ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
+	if (ret)
+		goto err_unref;
+
+	/* PreHSW required 512K alignment, HSW requires 16M */
+	ret = i915_gem_obj_ggtt_pin(bo, SZ_16M, 0);
+	if (ret)
+		goto err_unref;
+
+	dev_priv->perf.oa.oa_buffer.gtt_offset = i915_gem_obj_ggtt_offset(bo);
+	dev_priv->perf.oa.oa_buffer.addr = vmap_oa_buffer(bo);
+
+	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
+
+	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",
+			 dev_priv->perf.oa.oa_buffer.gtt_offset,
+			 dev_priv->perf.oa.oa_buffer.addr);
+
+	goto unlock;
+
+err_unref:
+	drm_gem_object_unreference(&bo->base);
+
+unlock:
+	mutex_unlock(&dev_priv->dev->struct_mutex);
+	return ret;
+}
+
+static void config_oa_regs(struct drm_i915_private *dev_priv,
+			   const struct i915_oa_reg *regs,
+			   int n_regs)
+{
+	int i;
+
+	for (i = 0; i < n_regs; i++) {
+		const struct i915_oa_reg *reg = regs + i;
+
+		I915_WRITE(reg->addr, reg->value);
+	}
+}
+
+static void hsw_enable_metric_set(struct drm_i915_private *dev_priv)
+{
+	dev_priv->perf.oa.mux_regs = NULL;
+	dev_priv->perf.oa.mux_regs_len = 0;
+	dev_priv->perf.oa.b_counter_regs = NULL;
+	dev_priv->perf.oa.b_counter_regs_len = 0;
+
+	I915_WRITE(GDT_CHICKEN_BITS, GT_NOA_ENABLE);
+
+	/* PRM:
+	 *
+	 * OA unit is using “crclk” for its functionality. When trunk
+	 * level clock gating takes place, OA clock would be gated,
+	 * unable to count the events from non-render clock domain.
+	 * Render clock gating must be disabled when OA is enabled to
+	 * count the events from non-render domain. Unit level clock
+	 * gating for RCS should also be disabled.
+	 */
+	I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) &
+				    ~GEN7_DOP_CLOCK_GATE_ENABLE));
+	I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) |
+				  GEN6_CSUNIT_CLOCK_GATE_DISABLE));
+
+	switch (dev_priv->perf.oa.metrics_set) {
+	case I915_OA_METRICS_SET_3D:
+		config_oa_regs(dev_priv, i915_oa_3d_mux_config_hsw,
+			       i915_oa_3d_mux_config_hsw_len);
+		config_oa_regs(dev_priv, i915_oa_3d_b_counter_config_hsw,
+			       i915_oa_3d_b_counter_config_hsw_len);
+		break;
+	default:
+		BUG();
+	}
+}
+
+static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) &
+				  ~GEN6_CSUNIT_CLOCK_GATE_DISABLE));
+	I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) |
+				    GEN7_DOP_CLOCK_GATE_ENABLE));
+
+	I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) &
+				      ~GT_NOA_ENABLE));
+}
+
+static void gen7_update_oacontrol(struct drm_i915_private *dev_priv)
+{
+	if (dev_priv->perf.oa.exclusive_event->enabled) {
+		unsigned long ctx_id = 0;
+		bool pinning_ok = false;
+
+		if (dev_priv->perf.oa.exclusive_event->ctx &&
+		    dev_priv->perf.oa.specific_ctx_id) {
+			ctx_id = dev_priv->perf.oa.specific_ctx_id;
+			pinning_ok = true;
+		}
+
+		if (dev_priv->perf.oa.exclusive_event->ctx == NULL ||
+		    pinning_ok) {
+			bool periodic = dev_priv->perf.oa.periodic;
+			u32 period_exponent = dev_priv->perf.oa.period_exponent;
+			u32 report_format = dev_priv->perf.oa.oa_buffer.format;
+
+			I915_WRITE(GEN7_OACONTROL,
+				   (ctx_id & GEN7_OACONTROL_CTX_MASK) |
+				   (period_exponent <<
+				    GEN7_OACONTROL_TIMER_PERIOD_SHIFT) |
+				   (periodic ?
+				    GEN7_OACONTROL_TIMER_ENABLE : 0) |
+				   (report_format <<
+				    GEN7_OACONTROL_FORMAT_SHIFT) |
+				   (ctx_id ?
+				    GEN7_OACONTROL_PER_CTX_ENABLE : 0) |
+				   GEN7_OACONTROL_ENABLE);
+			return;
+		}
+	}
+
+	I915_WRITE(GEN7_OACONTROL, 0);
+}
+
+static void gen7_oa_enable(struct drm_i915_private *dev_priv)
+{
+	u32 oastatus1, tail;
+
+	gen7_update_oacontrol(dev_priv);
+
+	/* Reset the head ptr so we don't forward reports from before now. */
+	oastatus1 = I915_READ(GEN7_OASTATUS1);
+	tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
+	I915_WRITE(GEN7_OASTATUS2, (tail & GEN7_OASTATUS2_HEAD_MASK) |
+				    OA_MEM_SELECT_GGTT);
+}
+
+static void i915_oa_event_enable(struct i915_perf_event *event)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	dev_priv->perf.oa.ops.oa_enable(dev_priv);
+
+	if (dev_priv->perf.oa.periodic)
+		hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
+			      ns_to_ktime(POLL_PERIOD),
+			      HRTIMER_MODE_REL_PINNED);
+}
+
+static void gen7_oa_disable(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN7_OACONTROL, 0);
+}
+
+static void i915_oa_event_disable(struct i915_perf_event *event)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+
+	dev_priv->perf.oa.ops.oa_disable(dev_priv);
+
+	if (dev_priv->perf.oa.periodic)
+		hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
+}
+
+static int i915_oa_event_init(struct i915_perf_event *event,
+			      struct drm_i915_perf_open_param *param)
+{
+	struct drm_i915_private *dev_priv = event->dev_priv;
+	struct drm_i915_perf_oa_attr oa_attr;
+	u32 known_flags = 0;
+	int format_size;
+	int ret;
+
+	BUG_ON(param->type != I915_PERF_OA_EVENT);
+
+	if (!dev_priv->perf.oa.ops.init_oa_buffer) {
+		DRM_ERROR("OA unit not supported\n");
+		return -ENODEV;
+	}
+
+	/* To avoid the complexity of having to accurately filter
+	 * counter reports and marshal to the appropriate client
+	 * we currently only allow exclusive access */
+	if (dev_priv->perf.oa.exclusive_event) {
+		DRM_ERROR("OA unit already in use\n");
+		return -EBUSY;
+	}
+
+	ret = i915_perf_copy_attr(to_user_ptr(param->attr),
+					      &oa_attr,
+					      I915_OA_ATTR_SIZE_VER0,
+					      sizeof(oa_attr));
+	if (ret)
+		return ret;
+
+	known_flags = I915_OA_FLAG_PERIODIC;
+	if (oa_attr.flags & ~known_flags) {
+		DRM_ERROR("Unknown drm_i915_perf_oa_attr flag\n");
+		return -EINVAL;
+	}
+
+	if (oa_attr.oa_format >= I915_OA_FORMAT_MAX) {
+		DRM_ERROR("Invalid OA report format\n");
+		return -EINVAL;
+	}
+
+	format_size = dev_priv->perf.oa.oa_formats[oa_attr.oa_format].size;
+	if (!format_size) {
+		DRM_ERROR("Invalid OA report format\n");
+		return -EINVAL;
+	}
+
+	dev_priv->perf.oa.oa_buffer.format_size = format_size;
+
+	dev_priv->perf.oa.oa_buffer.format =
+		dev_priv->perf.oa.oa_formats[oa_attr.oa_format].format;
+
+	if (IS_HASWELL(dev_priv->dev)) {
+		if (oa_attr.metrics_set <= 0 ||
+		    oa_attr.metrics_set > I915_OA_METRICS_SET_MAX) {
+			DRM_ERROR("Metric set not available\n");
+			return -EINVAL;
+		}
+	} else {
+		BUG(); /* checked above */
+		return -ENODEV;
+	}
+
+	dev_priv->perf.oa.metrics_set = oa_attr.metrics_set;
+
+	dev_priv->perf.oa.periodic = !!(oa_attr.flags & I915_OA_FLAG_PERIODIC);
+
+	/* NB: The exponent represents a period as follows:
+	 *
+	 *   80ns * 2^(period_exponent + 1)
+	 */
+	if (dev_priv->perf.oa.periodic) {
+		u64 period_exponent = oa_attr.oa_timer_exponent;
+
+		if (period_exponent > OA_EXPONENT_MAX)
+			return -EINVAL;
+
+		/* Theoretically we can program the OA unit to sample every
+		 * 160ns but don't allow that by default unless root...
+		 *
+		 * Referring to perf's kernel.perf_event_max_sample_rate for
+		 * a precedent (100000 by default); with an OA exponent of
+		 * 6 we get a period of 10.240 microseconds -just under
+		 * 100000Hz
+		 */
+		if (period_exponent < 6 && !capable(CAP_SYS_ADMIN)) {
+			DRM_ERROR("Sampling period too high without root privileges\n");
+			return -EACCES;
+		}
+
+		dev_priv->perf.oa.period_exponent = period_exponent;
+	} else if (oa_attr.oa_timer_exponent) {
+		DRM_ERROR("Sampling exponent specified without requesting periodic sampling");
+		return -EINVAL;
+	}
+
+	ret = alloc_oa_buffer(dev_priv);
+	if (ret)
+		return ret;
+
+	dev_priv->perf.oa.exclusive_event = event;
+
+	/* PRM - observability performance counters:
+	 *
+	 *   OACONTROL, performance counter enable, note:
+	 *
+	 *   "When this bit is set, in order to have coherent counts,
+	 *   RC6 power state and trunk clock gating must be disabled.
+	 *   This can be achieved by programming MMIO registers as
+	 *   0xA094=0 and 0xA090[31]=1"
+	 *
+	 *   In our case we are expected that taking pm + FORCEWAKE
+	 *   references will effectively disable RC6.
+	 */
+	intel_runtime_pm_get(dev_priv);
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
+
+	event->destroy = i915_oa_event_destroy;
+	event->enable = i915_oa_event_enable;
+	event->disable = i915_oa_event_disable;
+	event->can_read = i915_oa_can_read;
+	event->wait_unlocked = i915_oa_wait_unlocked;
+	event->poll_wait = i915_oa_poll_wait;
+	event->read = i915_oa_read;
+
+	return 0;
+}
+
+static void gen7_update_specific_hw_ctx_id(struct drm_i915_private *dev_priv,
+					   u32 ctx_id)
+{
+	dev_priv->perf.oa.specific_ctx_id = ctx_id;
+	gen7_update_oacontrol(dev_priv);
+}
+
+static void i915_oa_context_pin_notify_locked(struct drm_i915_private *dev_priv,
+					      struct intel_context *context)
+{
+	if (i915.enable_execlists ||
+	    dev_priv->perf.oa.ops.update_specific_hw_ctx_id == NULL)
+		return;
+
+	if (dev_priv->perf.oa.exclusive_event &&
+	    dev_priv->perf.oa.exclusive_event->ctx == context) {
+		struct drm_i915_gem_object *obj =
+			context->legacy_hw_ctx.rcs_state;
+		u32 ctx_id = i915_gem_obj_ggtt_offset(obj);
+
+		dev_priv->perf.oa.ops.update_specific_hw_ctx_id(dev_priv, ctx_id);
+	}
+}
+
+void i915_oa_context_pin_notify(struct drm_i915_private *dev_priv,
+				struct intel_context *context)
+{
+	if (!dev_priv->perf.initialized)
+		return;
+
+	mutex_lock(&dev_priv->perf.lock);
+	i915_oa_context_pin_notify_locked(dev_priv, context);
+	mutex_unlock(&dev_priv->perf.lock);
+}
+
 static ssize_t i915_perf_read_locked(struct i915_perf_event *event,
 				     struct file *file,
 				     char __user *buf,
@@ -152,6 +805,20 @@ static ssize_t i915_perf_read(struct file *file,
 	return ret;
 }
 
+static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
+{
+	struct drm_i915_private *dev_priv =
+		container_of(hrtimer, typeof(*dev_priv),
+			     perf.oa.poll_check_timer);
+
+	if (!dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv))
+		wake_up(&dev_priv->perf.oa.poll_wq);
+
+	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
+
+	return HRTIMER_RESTART;
+}
+
 static unsigned int i915_perf_poll_locked(struct i915_perf_event *event,
 					  struct file *file,
 					  poll_table *wait)
@@ -366,7 +1033,11 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev, void *data,
 	event->ctx = specific_ctx;
 
 	switch (param->type) {
-		/* TODO: Init according to specific type */
+	case I915_PERF_OA_EVENT:
+		ret = i915_oa_event_init(event, param);
+		if (ret)
+			goto err_alloc;
+		break;
 	default:
 		DRM_ERROR("Unknown perf event type\n");
 		ret = -EINVAL;
@@ -429,7 +1100,27 @@ void i915_perf_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
-	/* Currently no global event state to initialize */
+	if (!IS_HASWELL(dev))
+		return;
+
+	hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
+		     CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	dev_priv->perf.oa.poll_check_timer.function = poll_check_timer_cb;
+	init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
+
+	INIT_LIST_HEAD(&dev_priv->perf.events);
+	mutex_init(&dev_priv->perf.lock);
+
+	dev_priv->perf.oa.ops.init_oa_buffer = gen7_init_oa_buffer;
+	dev_priv->perf.oa.ops.enable_metric_set = hsw_enable_metric_set;
+	dev_priv->perf.oa.ops.disable_metric_set = hsw_disable_metric_set;
+	dev_priv->perf.oa.ops.oa_enable = gen7_oa_enable;
+	dev_priv->perf.oa.ops.oa_disable = gen7_oa_disable;
+	dev_priv->perf.oa.ops.update_specific_hw_ctx_id = gen7_update_specific_hw_ctx_id;
+	dev_priv->perf.oa.ops.read = gen7_oa_read;
+	dev_priv->perf.oa.ops.oa_buffer_is_empty = gen7_oa_buffer_is_empty;
+
+	dev_priv->perf.oa.oa_formats = hsw_oa_formats;
 
 	dev_priv->perf.initialized = true;
 }
@@ -441,7 +1132,7 @@ void i915_perf_fini(struct drm_device *dev)
 	if (!dev_priv->perf.initialized)
 		return;
 
-	/* Currently nothing to clean up */
+	dev_priv->perf.oa.ops.init_oa_buffer = NULL;
 
 	dev_priv->perf.initialized = false;
 }
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 2e488e8..0736358 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -537,6 +537,343 @@
 #define GEN7_3DPRIM_BASE_VERTEX         0x2440
 
 #define GEN7_OACONTROL 0x2360
+#define  GEN7_OACONTROL_CTX_MASK	    0xFFFFF000
+#define  GEN7_OACONTROL_TIMER_PERIOD_MASK   0x3F
+#define  GEN7_OACONTROL_TIMER_PERIOD_SHIFT  6
+#define  GEN7_OACONTROL_TIMER_ENABLE	    (1<<5)
+#define  GEN7_OACONTROL_FORMAT_A13	    (0<<2)
+#define  GEN7_OACONTROL_FORMAT_A29	    (1<<2)
+#define  GEN7_OACONTROL_FORMAT_A13_B8_C8    (2<<2)
+#define  GEN7_OACONTROL_FORMAT_A29_B8_C8    (3<<2)
+#define  GEN7_OACONTROL_FORMAT_B4_C8	    (4<<2)
+#define  GEN7_OACONTROL_FORMAT_A45_B8_C8    (5<<2)
+#define  GEN7_OACONTROL_FORMAT_B4_C8_A16    (6<<2)
+#define  GEN7_OACONTROL_FORMAT_C4_B8	    (7<<2)
+#define  GEN7_OACONTROL_FORMAT_SHIFT	    2
+#define  GEN7_OACONTROL_PER_CTX_ENABLE	    (1<<1)
+#define  GEN7_OACONTROL_ENABLE		    (1<<0)
+
+#define GEN8_OACTXID 0x2364
+
+#define GEN8_OACONTROL 0x2B00
+#define  GEN8_OA_REPORT_FORMAT_A12	    (0<<2)
+#define  GEN8_OA_REPORT_FORMAT_A12_B8_C8    (2<<2)
+#define  GEN8_OA_REPORT_FORMAT_A36_B8_C8    (5<<2)
+#define  GEN8_OA_REPORT_FORMAT_C4_B8	    (7<<2)
+#define  GEN8_OA_REPORT_FORMAT_SHIFT	    2
+#define  GEN8_OA_SPECIFIC_CONTEXT_ENABLE    (1<<1)
+#define  GEN8_OA_COUNTER_ENABLE             (1<<0)
+
+#define GEN8_OACTXCONTROL 0x2360
+#define  GEN8_OA_TIMER_PERIOD_MASK	    0x3F
+#define  GEN8_OA_TIMER_PERIOD_SHIFT	    2
+#define  GEN8_OA_TIMER_ENABLE		    (1<<1)
+#define  GEN8_OA_COUNTER_RESUME		    (1<<0)
+
+#define GEN7_OABUFFER 0x23B0 /* R/W */
+#define  GEN7_OABUFFER_OVERRUN_DISABLE	    (1<<3)
+#define  GEN7_OABUFFER_EDGE_TRIGGER	    (1<<2)
+#define  GEN7_OABUFFER_STOP_RESUME_ENABLE   (1<<1)
+#define  GEN7_OABUFFER_RESUME		    (1<<0)
+
+#define GEN8_OABUFFER 0x2b14
+
+#define GEN7_OASTATUS1 0x2364
+#define  GEN7_OASTATUS1_TAIL_MASK	    0xffffffc0
+#define  GEN7_OASTATUS1_COUNTER_OVERFLOW    (1<<2)
+#define  GEN7_OASTATUS1_OABUFFER_OVERFLOW   (1<<1)
+#define  GEN7_OASTATUS1_REPORT_LOST	    (1<<0)
+
+#define GEN7_OASTATUS2 0x2368
+#define GEN7_OASTATUS2_HEAD_MASK    0xffffffc0
+
+#define GEN8_OASTATUS 0x2b08
+#define  GEN8_OASTATUS_OVERRUN_STATUS	    (1<<3)
+#define  GEN8_OASTATUS_COUNTER_OVERFLOW     (1<<2)
+#define  GEN8_OASTATUS_OABUFFER_OVERFLOW    (1<<1)
+#define  GEN8_OASTATUS_REPORT_LOST	    (1<<0)
+
+#define GEN8_OAHEADPTR 0x2B0C
+#define GEN8_OATAILPTR 0x2B10
+
+#define OABUFFER_SIZE_128K  (0<<3)
+#define OABUFFER_SIZE_256K  (1<<3)
+#define OABUFFER_SIZE_512K  (2<<3)
+#define OABUFFER_SIZE_1M    (3<<3)
+#define OABUFFER_SIZE_2M    (4<<3)
+#define OABUFFER_SIZE_4M    (5<<3)
+#define OABUFFER_SIZE_8M    (6<<3)
+#define OABUFFER_SIZE_16M   (7<<3)
+
+#define OA_MEM_SELECT_GGTT  (1<<0)
+
+#define EU_PERF_CNTL0	    0xe458
+
+#define GDT_CHICKEN_BITS    0x9840
+#define GT_NOA_ENABLE	    0x00000080
+
+/*
+ * OA Boolean state
+ */
+
+#define OAREPORTTRIG1 0x2740
+#define OAREPORTTRIG1_THRESHOLD_MASK 0xffff
+#define OAREPORTTRIG1_EDGE_LEVEL_TRIGER_SELECT_MASK 0xffff0000 /* 0=level */
+
+#define OAREPORTTRIG2 0x2744
+#define OAREPORTTRIG2_INVERT_A_0  (1<<0)
+#define OAREPORTTRIG2_INVERT_A_1  (1<<1)
+#define OAREPORTTRIG2_INVERT_A_2  (1<<2)
+#define OAREPORTTRIG2_INVERT_A_3  (1<<3)
+#define OAREPORTTRIG2_INVERT_A_4  (1<<4)
+#define OAREPORTTRIG2_INVERT_A_5  (1<<5)
+#define OAREPORTTRIG2_INVERT_A_6  (1<<6)
+#define OAREPORTTRIG2_INVERT_A_7  (1<<7)
+#define OAREPORTTRIG2_INVERT_A_8  (1<<8)
+#define OAREPORTTRIG2_INVERT_A_9  (1<<9)
+#define OAREPORTTRIG2_INVERT_A_10 (1<<10)
+#define OAREPORTTRIG2_INVERT_A_11 (1<<11)
+#define OAREPORTTRIG2_INVERT_A_12 (1<<12)
+#define OAREPORTTRIG2_INVERT_A_13 (1<<13)
+#define OAREPORTTRIG2_INVERT_A_14 (1<<14)
+#define OAREPORTTRIG2_INVERT_A_15 (1<<15)
+#define OAREPORTTRIG2_INVERT_B_0  (1<<16)
+#define OAREPORTTRIG2_INVERT_B_1  (1<<17)
+#define OAREPORTTRIG2_INVERT_B_2  (1<<18)
+#define OAREPORTTRIG2_INVERT_B_3  (1<<19)
+#define OAREPORTTRIG2_INVERT_C_0  (1<<20)
+#define OAREPORTTRIG2_INVERT_C_1  (1<<21)
+#define OAREPORTTRIG2_INVERT_D_0  (1<<22)
+#define OAREPORTTRIG2_THRESHOLD_ENABLE	    (1<<23)
+#define OAREPORTTRIG2_REPORT_TRIGGER_ENABLE (1<<31)
+
+#define OAREPORTTRIG3 0x2748
+#define OAREPORTTRIG3_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG3_NOA_SELECT_8_SHIFT    0
+#define OAREPORTTRIG3_NOA_SELECT_9_SHIFT    4
+#define OAREPORTTRIG3_NOA_SELECT_10_SHIFT   8
+#define OAREPORTTRIG3_NOA_SELECT_11_SHIFT   12
+#define OAREPORTTRIG3_NOA_SELECT_12_SHIFT   16
+#define OAREPORTTRIG3_NOA_SELECT_13_SHIFT   20
+#define OAREPORTTRIG3_NOA_SELECT_14_SHIFT   24
+#define OAREPORTTRIG3_NOA_SELECT_15_SHIFT   28
+
+#define OAREPORTTRIG4 0x274c
+#define OAREPORTTRIG4_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG4_NOA_SELECT_0_SHIFT    0
+#define OAREPORTTRIG4_NOA_SELECT_1_SHIFT    4
+#define OAREPORTTRIG4_NOA_SELECT_2_SHIFT    8
+#define OAREPORTTRIG4_NOA_SELECT_3_SHIFT    12
+#define OAREPORTTRIG4_NOA_SELECT_4_SHIFT    16
+#define OAREPORTTRIG4_NOA_SELECT_5_SHIFT    20
+#define OAREPORTTRIG4_NOA_SELECT_6_SHIFT    24
+#define OAREPORTTRIG4_NOA_SELECT_7_SHIFT    28
+
+#define OAREPORTTRIG5 0x2750
+#define OAREPORTTRIG5_THRESHOLD_MASK 0xffff
+#define OAREPORTTRIG5_EDGE_LEVEL_TRIGER_SELECT_MASK 0xffff0000 /* 0=level */
+
+#define OAREPORTTRIG6 0x2754
+#define OAREPORTTRIG6_INVERT_A_0  (1<<0)
+#define OAREPORTTRIG6_INVERT_A_1  (1<<1)
+#define OAREPORTTRIG6_INVERT_A_2  (1<<2)
+#define OAREPORTTRIG6_INVERT_A_3  (1<<3)
+#define OAREPORTTRIG6_INVERT_A_4  (1<<4)
+#define OAREPORTTRIG6_INVERT_A_5  (1<<5)
+#define OAREPORTTRIG6_INVERT_A_6  (1<<6)
+#define OAREPORTTRIG6_INVERT_A_7  (1<<7)
+#define OAREPORTTRIG6_INVERT_A_8  (1<<8)
+#define OAREPORTTRIG6_INVERT_A_9  (1<<9)
+#define OAREPORTTRIG6_INVERT_A_10 (1<<10)
+#define OAREPORTTRIG6_INVERT_A_11 (1<<11)
+#define OAREPORTTRIG6_INVERT_A_12 (1<<12)
+#define OAREPORTTRIG6_INVERT_A_13 (1<<13)
+#define OAREPORTTRIG6_INVERT_A_14 (1<<14)
+#define OAREPORTTRIG6_INVERT_A_15 (1<<15)
+#define OAREPORTTRIG6_INVERT_B_0  (1<<16)
+#define OAREPORTTRIG6_INVERT_B_1  (1<<17)
+#define OAREPORTTRIG6_INVERT_B_2  (1<<18)
+#define OAREPORTTRIG6_INVERT_B_3  (1<<19)
+#define OAREPORTTRIG6_INVERT_C_0  (1<<20)
+#define OAREPORTTRIG6_INVERT_C_1  (1<<21)
+#define OAREPORTTRIG6_INVERT_D_0  (1<<22)
+#define OAREPORTTRIG6_THRESHOLD_ENABLE	    (1<<23)
+#define OAREPORTTRIG6_REPORT_TRIGGER_ENABLE (1<<31)
+
+#define OAREPORTTRIG7 0x2758
+#define OAREPORTTRIG7_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG7_NOA_SELECT_8_SHIFT    0
+#define OAREPORTTRIG7_NOA_SELECT_9_SHIFT    4
+#define OAREPORTTRIG7_NOA_SELECT_10_SHIFT   8
+#define OAREPORTTRIG7_NOA_SELECT_11_SHIFT   12
+#define OAREPORTTRIG7_NOA_SELECT_12_SHIFT   16
+#define OAREPORTTRIG7_NOA_SELECT_13_SHIFT   20
+#define OAREPORTTRIG7_NOA_SELECT_14_SHIFT   24
+#define OAREPORTTRIG7_NOA_SELECT_15_SHIFT   28
+
+#define OAREPORTTRIG8 0x275c
+#define OAREPORTTRIG8_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG8_NOA_SELECT_0_SHIFT    0
+#define OAREPORTTRIG8_NOA_SELECT_1_SHIFT    4
+#define OAREPORTTRIG8_NOA_SELECT_2_SHIFT    8
+#define OAREPORTTRIG8_NOA_SELECT_3_SHIFT    12
+#define OAREPORTTRIG8_NOA_SELECT_4_SHIFT    16
+#define OAREPORTTRIG8_NOA_SELECT_5_SHIFT    20
+#define OAREPORTTRIG8_NOA_SELECT_6_SHIFT    24
+#define OAREPORTTRIG8_NOA_SELECT_7_SHIFT    28
+
+#define OASTARTTRIG1 0x2710
+#define OASTARTTRIG1_THRESHOLD_COUNT_MASK_MBZ 0xffff0000
+#define OASTARTTRIG1_THRESHOLD_MASK	      0xffff
+
+#define OASTARTTRIG2 0x2714
+#define OASTARTTRIG2_INVERT_A_0 (1<<0)
+#define OASTARTTRIG2_INVERT_A_1 (1<<1)
+#define OASTARTTRIG2_INVERT_A_2 (1<<2)
+#define OASTARTTRIG2_INVERT_A_3 (1<<3)
+#define OASTARTTRIG2_INVERT_A_4 (1<<4)
+#define OASTARTTRIG2_INVERT_A_5 (1<<5)
+#define OASTARTTRIG2_INVERT_A_6 (1<<6)
+#define OASTARTTRIG2_INVERT_A_7 (1<<7)
+#define OASTARTTRIG2_INVERT_A_8 (1<<8)
+#define OASTARTTRIG2_INVERT_A_9 (1<<9)
+#define OASTARTTRIG2_INVERT_A_10 (1<<10)
+#define OASTARTTRIG2_INVERT_A_11 (1<<11)
+#define OASTARTTRIG2_INVERT_A_12 (1<<12)
+#define OASTARTTRIG2_INVERT_A_13 (1<<13)
+#define OASTARTTRIG2_INVERT_A_14 (1<<14)
+#define OASTARTTRIG2_INVERT_A_15 (1<<15)
+#define OASTARTTRIG2_INVERT_B_0 (1<<16)
+#define OASTARTTRIG2_INVERT_B_1 (1<<17)
+#define OASTARTTRIG2_INVERT_B_2 (1<<18)
+#define OASTARTTRIG2_INVERT_B_3 (1<<19)
+#define OASTARTTRIG2_INVERT_C_0 (1<<20)
+#define OASTARTTRIG2_INVERT_C_1 (1<<21)
+#define OASTARTTRIG2_INVERT_D_0 (1<<22)
+#define OASTARTTRIG2_THRESHOLD_ENABLE	    (1<<23)
+#define OASTARTTRIG2_START_TRIG_FLAG_MBZ    (1<<24)
+#define OASTARTTRIG2_EVENT_SELECT_0  (1<<28)
+#define OASTARTTRIG2_EVENT_SELECT_1  (1<<29)
+#define OASTARTTRIG2_EVENT_SELECT_2  (1<<30)
+#define OASTARTTRIG2_EVENT_SELECT_3  (1<<31)
+
+#define OASTARTTRIG3 0x2718
+#define OASTARTTRIG3_NOA_SELECT_MASK	   0xf
+#define OASTARTTRIG3_NOA_SELECT_8_SHIFT    0
+#define OASTARTTRIG3_NOA_SELECT_9_SHIFT    4
+#define OASTARTTRIG3_NOA_SELECT_10_SHIFT   8
+#define OASTARTTRIG3_NOA_SELECT_11_SHIFT   12
+#define OASTARTTRIG3_NOA_SELECT_12_SHIFT   16
+#define OASTARTTRIG3_NOA_SELECT_13_SHIFT   20
+#define OASTARTTRIG3_NOA_SELECT_14_SHIFT   24
+#define OASTARTTRIG3_NOA_SELECT_15_SHIFT   28
+
+#define OASTARTTRIG4 0x271c
+#define OASTARTTRIG4_NOA_SELECT_MASK	    0xf
+#define OASTARTTRIG4_NOA_SELECT_0_SHIFT    0
+#define OASTARTTRIG4_NOA_SELECT_1_SHIFT    4
+#define OASTARTTRIG4_NOA_SELECT_2_SHIFT    8
+#define OASTARTTRIG4_NOA_SELECT_3_SHIFT    12
+#define OASTARTTRIG4_NOA_SELECT_4_SHIFT    16
+#define OASTARTTRIG4_NOA_SELECT_5_SHIFT    20
+#define OASTARTTRIG4_NOA_SELECT_6_SHIFT    24
+#define OASTARTTRIG4_NOA_SELECT_7_SHIFT    28
+
+#define OASTARTTRIG5 0x2720
+#define OASTARTTRIG5_THRESHOLD_COUNT_MASK_MBZ 0xffff0000
+#define OASTARTTRIG5_THRESHOLD_MASK	      0xffff
+
+#define OASTARTTRIG6 0x2724
+#define OASTARTTRIG6_INVERT_A_0 (1<<0)
+#define OASTARTTRIG6_INVERT_A_1 (1<<1)
+#define OASTARTTRIG6_INVERT_A_2 (1<<2)
+#define OASTARTTRIG6_INVERT_A_3 (1<<3)
+#define OASTARTTRIG6_INVERT_A_4 (1<<4)
+#define OASTARTTRIG6_INVERT_A_5 (1<<5)
+#define OASTARTTRIG6_INVERT_A_6 (1<<6)
+#define OASTARTTRIG6_INVERT_A_7 (1<<7)
+#define OASTARTTRIG6_INVERT_A_8 (1<<8)
+#define OASTARTTRIG6_INVERT_A_9 (1<<9)
+#define OASTARTTRIG6_INVERT_A_10 (1<<10)
+#define OASTARTTRIG6_INVERT_A_11 (1<<11)
+#define OASTARTTRIG6_INVERT_A_12 (1<<12)
+#define OASTARTTRIG6_INVERT_A_13 (1<<13)
+#define OASTARTTRIG6_INVERT_A_14 (1<<14)
+#define OASTARTTRIG6_INVERT_A_15 (1<<15)
+#define OASTARTTRIG6_INVERT_B_0 (1<<16)
+#define OASTARTTRIG6_INVERT_B_1 (1<<17)
+#define OASTARTTRIG6_INVERT_B_2 (1<<18)
+#define OASTARTTRIG6_INVERT_B_3 (1<<19)
+#define OASTARTTRIG6_INVERT_C_0 (1<<20)
+#define OASTARTTRIG6_INVERT_C_1 (1<<21)
+#define OASTARTTRIG6_INVERT_D_0 (1<<22)
+#define OASTARTTRIG6_THRESHOLD_ENABLE	    (1<<23)
+#define OASTARTTRIG6_START_TRIG_FLAG_MBZ    (1<<24)
+#define OASTARTTRIG6_EVENT_SELECT_4  (1<<28)
+#define OASTARTTRIG6_EVENT_SELECT_5  (1<<29)
+#define OASTARTTRIG6_EVENT_SELECT_6  (1<<30)
+#define OASTARTTRIG6_EVENT_SELECT_7  (1<<31)
+
+#define OASTARTTRIG7 0x2728
+#define OASTARTTRIG7_NOA_SELECT_MASK	   0xf
+#define OASTARTTRIG7_NOA_SELECT_8_SHIFT    0
+#define OASTARTTRIG7_NOA_SELECT_9_SHIFT    4
+#define OASTARTTRIG7_NOA_SELECT_10_SHIFT   8
+#define OASTARTTRIG7_NOA_SELECT_11_SHIFT   12
+#define OASTARTTRIG7_NOA_SELECT_12_SHIFT   16
+#define OASTARTTRIG7_NOA_SELECT_13_SHIFT   20
+#define OASTARTTRIG7_NOA_SELECT_14_SHIFT   24
+#define OASTARTTRIG7_NOA_SELECT_15_SHIFT   28
+
+#define OASTARTTRIG8 0x272c
+#define OASTARTTRIG8_NOA_SELECT_MASK	   0xf
+#define OASTARTTRIG8_NOA_SELECT_0_SHIFT    0
+#define OASTARTTRIG8_NOA_SELECT_1_SHIFT    4
+#define OASTARTTRIG8_NOA_SELECT_2_SHIFT    8
+#define OASTARTTRIG8_NOA_SELECT_3_SHIFT    12
+#define OASTARTTRIG8_NOA_SELECT_4_SHIFT    16
+#define OASTARTTRIG8_NOA_SELECT_5_SHIFT    20
+#define OASTARTTRIG8_NOA_SELECT_6_SHIFT    24
+#define OASTARTTRIG8_NOA_SELECT_7_SHIFT    28
+
+/* CECX_0 */
+#define OACEC_COMPARE_LESS_OR_EQUAL	6
+#define OACEC_COMPARE_NOT_EQUAL		5
+#define OACEC_COMPARE_LESS_THAN		4
+#define OACEC_COMPARE_GREATER_OR_EQUAL	3
+#define OACEC_COMPARE_EQUAL		2
+#define OACEC_COMPARE_GREATER_THAN	1
+#define OACEC_COMPARE_ANY_EQUAL		0
+
+#define OACEC_COMPARE_VALUE_MASK    0xffff
+#define OACEC_COMPARE_VALUE_SHIFT   3
+
+#define OACEC_SELECT_NOA	(0<<19)
+#define OACEC_SELECT_PREV	(1<<19)
+#define OACEC_SELECT_BOOLEAN	(2<<19)
+
+/* CECX_1 */
+#define OACEC_MASK_MASK		    0xffff
+#define OACEC_CONSIDERATIONS_MASK   0xffff
+#define OACEC_CONSIDERATIONS_SHIFT  16
+
+#define OACEC0_0 0x2770
+#define OACEC0_1 0x2774
+#define OACEC1_0 0x2778
+#define OACEC1_1 0x277c
+#define OACEC2_0 0x2780
+#define OACEC2_1 0x2784
+#define OACEC3_0 0x2788
+#define OACEC3_1 0x278c
+#define OACEC4_0 0x2790
+#define OACEC4_1 0x2794
+#define OACEC5_0 0x2798
+#define OACEC5_1 0x279c
+#define OACEC6_0 0x27a0
+#define OACEC6_1 0x27a4
+#define OACEC7_0 0x27a8
+#define OACEC7_1 0x27ac
+
 
 #define _GEN7_PIPEA_DE_LOAD_SL	0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL	0x71068
@@ -6668,6 +7005,7 @@ enum skl_disp_power_wells {
 # define GEN6_RCCUNIT_CLOCK_GATE_DISABLE		(1 << 11)
 
 #define GEN6_UCGCTL3				0x9408
+# define GEN6_OACSUNIT_CLOCK_GATE_DISABLE		(1 << 20)
 
 #define GEN7_UCGCTL4				0x940c
 #define  GEN7_L3BANK2X_CLOCK_GATE_DISABLE	(1<<25)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a84f71f..af5dfd4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -58,6 +58,29 @@
 #define I915_ERROR_UEVENT		"ERROR"
 #define I915_RESET_UEVENT		"RESET"
 
+/*
+ * perf events configuration exposed by i915 through
+ * /sys/bus/event_sources/drivers/i915_oa
+ */
+
+enum drm_i915_oa_format {
+	I915_OA_FORMAT_A13	    = 0,
+	I915_OA_FORMAT_A29	    = 1,
+	I915_OA_FORMAT_A13_B8_C8    = 2,
+	I915_OA_FORMAT_B4_C8	    = 4,
+	I915_OA_FORMAT_A45_B8_C8    = 5,
+	I915_OA_FORMAT_B4_C8_A16    = 6,
+	I915_OA_FORMAT_C4_B8	    = 7,
+
+	I915_OA_FORMAT_MAX	    /* non-ABI */
+};
+
+enum drm_i915_oa_set {
+	I915_OA_METRICS_SET_3D			= 1,
+
+	I915_OA_METRICS_SET_MAX			/* non-ABI */
+};
+
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
 #define I915_NR_TEX_REGIONS 255	/* table size 2k - maximum due to use
@@ -1132,9 +1155,35 @@ struct drm_i915_gem_context_param {
 };
 
 enum drm_i915_perf_event_type {
+	I915_PERF_OA_EVENT = 1,
+
 	I915_PERF_EVENT_TYPE_MAX	/* non-ABI */
 };
 
+
+#define I915_OA_FLAG_PERIODIC		(1<<0)
+
+struct drm_i915_perf_oa_attr {
+	__u32 size;
+
+	__u32 flags;
+
+	__u32 metrics_set;
+	__u32 oa_format;
+	__u32 oa_timer_exponent;
+};
+
+/* Note: same versioning scheme as struct perf_event_attr
+ *
+ * Userspace specified size defines ABI version and kernel
+ * zero extends to size of latest version. If userspace
+ * gives a larger structure than the kernel expects then
+ * kernel asserts that all unknown fields are zero.
+ */
+#define I915_OA_ATTR_SIZE_VER0		20 /* sizeof first published struct */
+
+
+
 #define I915_PERF_FLAG_FD_CLOEXEC	(1<<0)
 #define I915_PERF_FLAG_FD_NONBLOCK	(1<<1)
 #define I915_PERF_FLAG_SINGLE_CONTEXT	(1<<2)
@@ -1188,6 +1237,20 @@ enum drm_i915_perf_record_type {
 	 */
 	DRM_I915_PERF_RECORD_SAMPLE = 1,
 
+	/*
+	 * Indicates that one or more OA reports was not written
+	 * by the hardware.
+	 */
+	DRM_I915_PERF_RECORD_OA_REPORT_LOST = 2,
+
+	/*
+	 * Indicates that the internal circular buffer that Gen
+	 * graphics writes OA reports into has filled, which may
+	 * either mean that old reports could be overwritten or
+	 * subsequent reports lost until the buffer is cleared.
+	 */
+	DRM_I915_PERF_RECORD_OA_BUFFER_OVERFLOW = 3,
+
 	DRM_I915_PERF_RECORD_MAX /* non-ABI */
 };
 
-- 
2.5.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 5/6] drm/i915: Add dev.i915.perf_event_paranoid sysctl option
       [not found] ` <1443537549-6905-1-git-send-email-robert-St23OQVBDYPNLxjTenLetw@public.gmane.org>
@ 2015-09-29 14:39   ` Robert Bragg
  2015-09-30  8:30   ` [RFC 0/6] Non perf based Gen Graphics OA unit driver Chris Wilson
  1 sibling, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-09-29 14:39 UTC (permalink / raw)
  To: intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW
  Cc: Daniel Vetter, Chris Wilson, Sourab Gupta, Zhenyu Wang,
	Jani Nikula, David Airlie, Peter Zijlstra, Ingo Molnar,
	Kan Liang, Alexander Shishkin, Zheng Yan, Mark Rutland,
	Matt Fleming, dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg <robert-St23OQVBDYPNLxjTenLetw@public.gmane.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 46 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 45 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index bc1c4d1..ab82857 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -38,6 +38,8 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD max_t(u64, 10000, NSEC_PER_SEC / POLL_FREQUENCY)
 
+static u32 i915_perf_event_paranoid = true;
+
 #define OA_EXPONENT_MAX 0x3f
 
 static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
@@ -1016,7 +1018,13 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev, void *data,
 		}
 	}
 
-	if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+	 * we check a dev.i915.perf_event_paranoid sysctl option
+	 * to determine if it's ok to access system wide OA counters
+	 * without CAP_SYS_ADMIN privileges.
+	 */
+	if (!specific_ctx &&
+	    i915_perf_event_paranoid && !capable(CAP_SYS_ADMIN)) {
 		DRM_ERROR("Insufficient privileges to open perf event\n");
 		ret = -EACCES;
 		goto err_ctx;
@@ -1096,6 +1104,38 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 	return ret;
 }
 
+
+static struct ctl_table oa_table[] = {
+	{
+	 .procname = "perf_event_paranoid",
+	 .data = &i915_perf_event_paranoid,
+	 .maxlen = sizeof(i915_perf_event_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec,
+	 },
+	{}
+};
+
+static struct ctl_table i915_root[] = {
+	{
+	 .procname = "i915",
+	 .maxlen = 0,
+	 .mode = 0555,
+	 .child = oa_table,
+	 },
+	{}
+};
+
+static struct ctl_table dev_root[] = {
+	{
+	 .procname = "dev",
+	 .maxlen = 0,
+	 .mode = 0555,
+	 .child = i915_root,
+	 },
+	{}
+};
+
 void i915_perf_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
@@ -1103,6 +1143,8 @@ void i915_perf_init(struct drm_device *dev)
 	if (!IS_HASWELL(dev))
 		return;
 
+	dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
 	hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
 		     CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	dev_priv->perf.oa.poll_check_timer.function = poll_check_timer_cb;
@@ -1132,6 +1174,8 @@ void i915_perf_fini(struct drm_device *dev)
 	if (!dev_priv->perf.initialized)
 		return;
 
+	unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
 	dev_priv->perf.oa.ops.init_oa_buffer = NULL;
 
 	dev_priv->perf.initialized = false;
-- 
2.5.2

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [RFC 6/6] drm/i915: add oa_event_min_timer_exponent sysctl
  2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
                   ` (3 preceding siblings ...)
  2015-09-29 14:39 ` [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
@ 2015-09-29 14:39 ` Robert Bragg
  2015-09-30  3:23 ` [RFC 0/6] Non perf based Gen Graphics OA unit driver Zhenyu Wang
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-09-29 14:39 UTC (permalink / raw)
  To: intel-gfx
  Cc: Mark Rutland, Matt Fleming, David Airlie, dri-devel,
	linux-kernel, Peter Zijlstra, Sourab Gupta, linux-api, Zheng Yan,
	Daniel Vetter, Ingo Molnar, Alexander Shishkin

The minimal sampling period is now configurable via a
dev.i915.oa_event_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 100000 samples/s.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 37 ++++++++++++++++++++++++++++---------
 1 file changed, 28 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index ab82857..5ef7d92 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -42,6 +42,23 @@ static u32 i915_perf_event_paranoid = true;
 
 #define OA_EXPONENT_MAX 0x3f
 
+/* for sysctl proc_dointvec_minmax of i915_oa_event_min_timer_exponent */
+static int zero;
+static int oa_exponent_max = OA_EXPONENT_MAX;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The period is derived from the exponent as:
+ *
+ *   period = 80ns * 2^(exponent + 1)
+ *
+ * Referring to perf's kernel.perf_event_max_sample_rate for a precedent
+ * (100000 by default); with an OA exponent of 6 we get a period of 10.240
+ * microseconds - just under 100000Hz
+ */
+static u32 i915_oa_event_min_timer_exponent = 6;
+
 static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13]	    = { 0, 64 },
 	[I915_OA_FORMAT_A29]	    = { 1, 128 },
@@ -674,15 +691,8 @@ static int i915_oa_event_init(struct i915_perf_event *event,
 		if (period_exponent > OA_EXPONENT_MAX)
 			return -EINVAL;
 
-		/* Theoretically we can program the OA unit to sample every
-		 * 160ns but don't allow that by default unless root...
-		 *
-		 * Referring to perf's kernel.perf_event_max_sample_rate for
-		 * a precedent (100000 by default); with an OA exponent of
-		 * 6 we get a period of 10.240 microseconds -just under
-		 * 100000Hz
-		 */
-		if (period_exponent < 6 && !capable(CAP_SYS_ADMIN)) {
+		if (period_exponent < i915_oa_event_min_timer_exponent &&
+		    !capable(CAP_SYS_ADMIN)) {
 			DRM_ERROR("Sampling period too high without root privileges\n");
 			return -EACCES;
 		}
@@ -1113,6 +1123,15 @@ static struct ctl_table oa_table[] = {
 	 .mode = 0644,
 	 .proc_handler = proc_dointvec,
 	 },
+	{
+	 .procname = "oa_event_min_timer_exponent",
+	 .data = &i915_oa_event_min_timer_exponent,
+	 .maxlen = sizeof(i915_oa_event_min_timer_exponent),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = &zero,
+	 .extra2 = &oa_exponent_max,
+	 },
 	{}
 };
 
-- 
2.5.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit
  2015-09-29 14:39 ` [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
@ 2015-09-29 14:55   ` kbuild test robot
  2015-09-29 15:18     ` Peter Zijlstra
  0 siblings, 1 reply; 18+ messages in thread
From: kbuild test robot @ 2015-09-29 14:55 UTC (permalink / raw)
  To: Robert Bragg
  Cc: Mark Rutland, Matt Fleming, linux-api, intel-gfx, linux-kernel,
	dri-devel, Peter Zijlstra, Sourab Gupta, kbuild-all, Zheng Yan,
	Daniel Vetter, Ingo Molnar, Alexander Shishkin

[-- Attachment #1: Type: text/plain, Size: 4189 bytes --]

Hi Robert,

[auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]

config: i386-defconfig (attached as .config)
reproduce:
  git checkout a1d59679ae8f3e7e7659e9723ae3fc69af2532e6
  # save the attached .config to linux build tree
  make ARCH=i386 

All warnings (new ones prefixed by >>):

   drivers/gpu/drm/i915/i915_perf.c: In function 'append_oa_sample':
>> drivers/gpu/drm/i915/i915_perf.c:199:2: warning: #warning "fixme: extract context ID from OA reports" [-Wcpp]
    #warning "fixme: extract context ID from OA reports"
     ^
>> drivers/gpu/drm/i915/i915_perf.c:205:2: warning: #warning "fixme: extract timestamp from OA reports" [-Wcpp]
    #warning "fixme: extract timestamp from OA reports"
     ^
   drivers/gpu/drm/i915/i915_perf.c: In function 'append_oa_status':
>> drivers/gpu/drm/i915/i915_perf.c:155:2: warning: ignoring return value of 'copy_to_user', declared with attribute warn_unused_result [-Wunused-result]
     copy_to_user(read_state->buf, &header, sizeof(header));
     ^
   drivers/gpu/drm/i915/i915_perf.c: In function 'append_oa_sample':
   drivers/gpu/drm/i915/i915_perf.c:195:2: warning: ignoring return value of 'copy_to_user', declared with attribute warn_unused_result [-Wunused-result]
     copy_to_user(read_state->buf, &header, sizeof(header));
     ^
   drivers/gpu/drm/i915/i915_perf.c:200:3: warning: ignoring return value of 'copy_to_user', declared with attribute warn_unused_result [-Wunused-result]
      copy_to_user(read_state->buf, &dummy_ctx_id, 4);
      ^
   drivers/gpu/drm/i915/i915_perf.c:206:3: warning: ignoring return value of 'copy_to_user', declared with attribute warn_unused_result [-Wunused-result]
      copy_to_user(read_state->buf, &dummy_timestamp, 4);
      ^
   drivers/gpu/drm/i915/i915_perf.c:211:3: warning: ignoring return value of 'copy_to_user', declared with attribute warn_unused_result [-Wunused-result]
      copy_to_user(read_state->buf, report, report_size);
      ^

vim +199 drivers/gpu/drm/i915/i915_perf.c

   149	{
   150		struct drm_i915_perf_event_header header = { type, 0, sizeof(header) };
   151	
   152		if ((read_state->count - read_state->read) < header.size)
   153			return false;
   154	
 > 155		copy_to_user(read_state->buf, &header, sizeof(header));
   156	
   157		read_state->buf += sizeof(header);
   158		read_state->read += header.size;
   159	
   160		return true;
   161	}
   162	
   163	static bool append_oa_sample(struct i915_perf_event *event,
   164				     struct i915_perf_read_state *read_state,
   165				     const u8 *report)
   166	{
   167		struct drm_i915_private *dev_priv = event->dev_priv;
   168		int report_size = dev_priv->perf.oa.oa_buffer.format_size;
   169		struct drm_i915_perf_event_header header;
   170		u32 sample_flags = event->sample_flags;
   171		u32 dummy_ctx_id = 0;
   172		u32 dummy_timestamp = 0;
   173	
   174		header.type = DRM_I915_PERF_RECORD_SAMPLE;
   175		header.misc = 0;
   176		header.size = sizeof(header);
   177	
   178	
   179		/* XXX: could pre-compute this when opening the event... */
   180	
   181		if (sample_flags & I915_PERF_SAMPLE_CTXID)
   182			header.size += 4;
   183	
   184		if (sample_flags & I915_PERF_SAMPLE_TIMESTAMP)
   185			header.size += 4;
   186	
   187		if (sample_flags & I915_PERF_SAMPLE_OA_REPORT)
   188			header.size += report_size;
   189	
   190	
   191		if ((read_state->count - read_state->read) < header.size)
   192			return false;
   193	
   194	
   195		copy_to_user(read_state->buf, &header, sizeof(header));
   196		read_state->buf += sizeof(header);
   197	
   198		if (sample_flags & I915_PERF_SAMPLE_CTXID) {
 > 199	#warning "fixme: extract context ID from OA reports"
   200			copy_to_user(read_state->buf, &dummy_ctx_id, 4);
   201			read_state->buf += 4;
   202		}
   203	
   204		if (sample_flags & I915_PERF_SAMPLE_TIMESTAMP) {
 > 205	#warning "fixme: extract timestamp from OA reports"
   206			copy_to_user(read_state->buf, &dummy_timestamp, 4);
   207			read_state->buf += 4;
   208		}

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/octet-stream, Size: 23761 bytes --]

[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Intel-gfx] [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit
  2015-09-29 14:55   ` [Intel-gfx] " kbuild test robot
@ 2015-09-29 15:18     ` Peter Zijlstra
  2015-09-29 23:19       ` [kbuild-all] " Fengguang Wu
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2015-09-29 15:18 UTC (permalink / raw)
  To: kbuild test robot
  Cc: Mark Rutland, Matt Fleming, linux-api, intel-gfx, linux-kernel,
	dri-devel, Alexander Shishkin, Sourab Gupta, kbuild-all,
	Zheng Yan, Daniel Vetter, Ingo Molnar, Robert Bragg

On Tue, Sep 29, 2015 at 10:55:39PM +0800, kbuild test robot wrote:
> Hi Robert,
> 
> [auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]
> 
> config: i386-defconfig (attached as .config)
> reproduce:
>   git checkout a1d59679ae8f3e7e7659e9723ae3fc69af2532e6
>   # save the attached .config to linux build tree
>   make ARCH=i386 
> 
> All warnings (new ones prefixed by >>):
> 

@Wu, hehe, another series pattern to match ;-)

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [kbuild-all] [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit
  2015-09-29 15:18     ` Peter Zijlstra
@ 2015-09-29 23:19       ` Fengguang Wu
  0 siblings, 0 replies; 18+ messages in thread
From: Fengguang Wu @ 2015-09-29 23:19 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Matt Fleming, David Airlie, linux-api, intel-gfx,
	linux-kernel, dri-devel, Alexander Shishkin, Sourab Gupta,
	kbuild-all, Zheng Yan, Daniel Vetter, Ingo Molnar

On Tue, Sep 29, 2015 at 05:18:45PM +0200, Peter Zijlstra wrote:
> On Tue, Sep 29, 2015 at 10:55:39PM +0800, kbuild test robot wrote:
> > Hi Robert,
> > 
> > [auto build test results on v4.3-rc3 -- if it's inappropriate base, please ignore]
> > 
> > config: i386-defconfig (attached as .config)
> > reproduce:
> >   git checkout a1d59679ae8f3e7e7659e9723ae3fc69af2532e6
> >   # save the attached .config to linux build tree
> >   make ARCH=i386 
> > 
> > All warnings (new ones prefixed by >>):
> > 
> 
> @Wu, hehe, another series pattern to match ;-)

Thanks! I'm now matching ^([...])? ?[... ii/NN] as patch series. :-)

Regards,
Fengguang
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
  2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
                   ` (4 preceding siblings ...)
  2015-09-29 14:39 ` [RFC 6/6] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
@ 2015-09-30  3:23 ` Zhenyu Wang
       [not found] ` <1443537549-6905-1-git-send-email-robert-St23OQVBDYPNLxjTenLetw@public.gmane.org>
  2015-10-16  9:43 ` Peter Zijlstra
  7 siblings, 0 replies; 18+ messages in thread
From: Zhenyu Wang @ 2015-09-30  3:23 UTC (permalink / raw)
  To: Robert Bragg
  Cc: Mark Rutland, Matt Fleming, dri-devel, David Airlie, intel-gfx,
	linux-kernel, Peter Zijlstra, Sourab Gupta, linux-api, Zheng Yan,
	Daniel Vetter, Ingo Molnar, Alexander Shishkin


[-- Attachment #1.1: Type: text/plain, Size: 989 bytes --]

On 2015.09.29 15:39:03 +0100, Robert Bragg wrote:
> 
> - Logistically it might be more practical to contain this to the
>   graphics stack.
> 
>     It seems fair to consider that if we can't see a very compelling
>     benefit to building on perf, then containing this work to
>     drivers/gpu/drm/i915 may simplify the review process as well as
>     future maintenance and development.
> 

I think even we all initially like to go with perf but it appears later
that we might need to stick this more close with i915 driver. Also think
about to enable global profiling for all graphics clients, extending or
enabling it within i915 specific interface seems more feasible instead of
trying to create another PMU driver like previous implementation attempt
to suit the need for different gfx perf data definition.

Robert, thanks for send and elaborate on this.

-- 
Open Source Technology Center, Intel ltd.

$gpg --keyserver wwwkeys.pgp.net --recv-keys 4D781827

[-- Attachment #1.2: Digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
       [not found] ` <1443537549-6905-1-git-send-email-robert-St23OQVBDYPNLxjTenLetw@public.gmane.org>
  2015-09-29 14:39   ` [RFC 5/6] drm/i915: Add dev.i915.perf_event_paranoid sysctl option Robert Bragg
@ 2015-09-30  8:30   ` Chris Wilson
  2015-09-30 13:36     ` Robert Bragg
  1 sibling, 1 reply; 18+ messages in thread
From: Chris Wilson @ 2015-09-30  8:30 UTC (permalink / raw)
  To: Robert Bragg
  Cc: intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Daniel Vetter,
	Sourab Gupta, Zhenyu Wang, Jani Nikula, David Airlie,
	Peter Zijlstra, Ingo Molnar, Kan Liang, Alexander Shishkin,
	Zheng Yan, Mark Rutland, Matt Fleming,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Tue, Sep 29, 2015 at 03:39:03PM +0100, Robert Bragg wrote:
> Updating Mesa and GPU Top to experiment with this was straightforward
> given the similarity to the perf interface.  The main difference is that
> it only supports forwarding metrics via read()s instead of an mmaped
> circular buffer. As mentioned above, I think that suits this well, and
> requires no additional copying of data. I think the userspace code has
> ended up being a little simpler too.

Did you try updating the existing perf based overlay?

> Overall the driver currently isn't much more code than with perf (~200
> lines).
> 
> Personally my gut feeling a.t.m, is that we should aim to move forward
> independent from perf.
> 
> I'd really appreciate some feedback from others on this though.
> 
> Daniel and Chris; although I think it made sense at the outset to try
> and use perf, in light of the above would you be open to a non-perf
> based driver for the OA unit?

No. I strongly dislike that they will be multiple incompatibile perf
interfaces and strongly like the coupling with other profiling that
comes with perf - i.e. we very much want to simultaneously sample CPU
and GPU workloads along with other devices, that information is much
more useful to me for the purposes of scheduling work and maximising
concurrency than optimising shaders.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
  2015-09-30  8:30   ` [RFC 0/6] Non perf based Gen Graphics OA unit driver Chris Wilson
@ 2015-09-30 13:36     ` Robert Bragg
  0 siblings, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-09-30 13:36 UTC (permalink / raw)
  To: Chris Wilson, Robert Bragg, intel-gfx, Daniel Vetter,
	Sourab Gupta, Zhenyu Wang, Jani Nikula, David Airlie,
	Peter Zijlstra, Ingo Molnar, Kan Liang, Alexander Shishkin,
	Zheng Yan, Mark Rutland, Matt Fleming, dri-devel, linux-kernel,
	linux-api


[-- Attachment #1.1: Type: text/plain, Size: 3503 bytes --]

On Wed, Sep 30, 2015 at 9:30 AM, Chris Wilson <chris@chris-wilson.co.uk>
wrote:

> On Tue, Sep 29, 2015 at 03:39:03PM +0100, Robert Bragg wrote:
> > Updating Mesa and GPU Top to experiment with this was straightforward
> > given the similarity to the perf interface.  The main difference is that
> > it only supports forwarding metrics via read()s instead of an mmaped
> > circular buffer. As mentioned above, I think that suits this well, and
> > requires no additional copying of data. I think the userspace code has
> > ended up being a little simpler too.
>
> Did you try updating the existing perf based overlay?
>

I don't recall the overlay attempting to read OA counters, but potentially
it could be quite nice to add support - sorry I hadn't considered that so
far.

I don't believe being perf based or not will affect the effort to do this
though. The perf based driver doesn't handle OA counter normalization in
the kernel so userspace needs to be able to handle that - which is probably
the bigger effort.

Something to note here about your early pmu driver, is that it was notably
for counters that were explicitly sampled from the cpu using a hrtimer via
mmio. I think they were a better fit for the existing perf design than the
OA unit, primarily because they were explicitly read from the cpu and each
counter was very independent.


>
> > Overall the driver currently isn't much more code than with perf (~200
> > lines).
> >
> > Personally my gut feeling a.t.m, is that we should aim to move forward
> > independent from perf.
> >
> > I'd really appreciate some feedback from others on this though.
> >
> > Daniel and Chris; although I think it made sense at the outset to try
> > and use perf, in light of the above would you be open to a non-perf
> > based driver for the OA unit?
>
> No. I strongly dislike that they will be multiple incompatibile perf
> interfaces and strongly like the coupling with other profiling that
> comes with perf - i.e. we very much want to simultaneously sample CPU
> and GPU workloads along with other devices, that information is much
> more useful to me for the purposes of scheduling work and maximising
> concurrency than optimising shaders.
>

In this case I don't think there's inherently any more compatibility that
comes from using perf or not - no existing userspace will Just Work™ with
the perf based OA driver.

I think some of the cases you're referring to may be ok to expose via the
existing perf infrastructure, but I'm currently enabling the OA unit which
poses some unique difficulties I've tried to explain.

A guiding differentiator may be whether or not the counter is orthogonal
(in terms of configuration and normalization) and explicitly readable from
the cpu, as to whether the existing perf pmu infrastructure is a good fit.

'i915 perf' shows my lack of imagination naming this and maybe another name
could imply a more limited scope. I.e. on a case by case basis, when
looking to expose a new counters we can still evaluate whether it makes
sense to expose via the existing perf infrastructure or this.

- Robert


> -Chris
>
> --
> Chris Wilson, Intel Open Source Technology Centre
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>

[-- Attachment #1.2: Type: text/html, Size: 4713 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
  2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
                   ` (6 preceding siblings ...)
       [not found] ` <1443537549-6905-1-git-send-email-robert-St23OQVBDYPNLxjTenLetw@public.gmane.org>
@ 2015-10-16  9:43 ` Peter Zijlstra
  2015-10-16 10:02   ` Ingo Molnar
  2015-10-20 20:16   ` Robert Bragg
  7 siblings, 2 replies; 18+ messages in thread
From: Peter Zijlstra @ 2015-10-16  9:43 UTC (permalink / raw)
  To: Robert Bragg
  Cc: Mark Rutland, Matt Fleming, dri-devel, David Airlie, intel-gfx,
	linux-kernel, Alexander Shishkin, Sourab Gupta, linux-api,
	Zheng Yan, Daniel Vetter, Ingo Molnar

On Tue, Sep 29, 2015 at 03:39:03PM +0100, Robert Bragg wrote:
> - We're bridging two complex architectures
> 
>     To review this work I think it will be relevant to have a good
>     general familiarity with Gen graphics (e.g. thinking about the OA
>     unit's interaction with the command streamer and execlist
>     scheduling) as well as our userspace architecture and how we're
>     consuming OA data within Mesa to implement the
>     INTEL_performance_query extension.
>
>     On the flip side here, its necessary to understand the perf
>     userspace interface (for most this is hidden by tools so the details
>     aren't common knowledge) as well as the internal design, considering
>     that the PMU we're looking at seems to break several current design
>     assumptions. I can only claim a limited familiarity with perf's
>     design, just as a result of this work.

Right; but a little effort and patience on both sides should get us
there I think. At worst we'll both learn something new ;-)

> - The current OA PMU driver breaks some significant design assumptions.
> 
>     Existing perf pmus are used for profiling work on a cpu and we're
>     introducing the idea of _IS_DEVICE pmus with different security
>     implications, the need to fake cpu-related data (such as user/kernel
>     registers) to fit with perf's current design, and adding _DEVICE
>     records as a way to forward device-specific status records.

There are more devices with counters on than GPUs, so I think it might
make sense to look at extending perf to better deal with this.

>     The OA unit writes reports of counters into a circular buffer,
>     without involvement from the CPU, making our PMU driver the first of
>     a kind.

Agreed, this is somewhat 'odd' from where we are today.

>     Perf supports groups of counters and allows those to be read via
>     transactions internally but transactions currently seem designed to
>     be explicitly initiated from the cpu (say in response to a userspace
>     read()) and while we could pull a report out of the OA buffer we
>     can't trigger a report from the cpu on demand.
>
>     Related to being report based; the OA counters are configured in HW
>     as a set while perf generally expects counter configurations to be
>     orthogonal. Although counters can be associated with a group leader
>     as they are opened, there's no clear precedent for being able to
>     provide group-wide configuration attributes and no obvious solution
>     as yet that's expected to be acceptable to upstream and meets our
>     userspace needs.

I'm not entirely sure what you mean with group-wide configuration
attributes; could you elaborate?

>     We currently avoid using perf's grouping feature
>     and forward OA reports to userspace via perf's 'raw' sample field.
>     This suits our userspace well considering how coupled the counters
>     are when dealing with normalizing. It would be inconvenient to split
>     counters up into separate events, only to require userspace to
>     recombine them. 

So IF you were using a group, a single read from the leader can return
you a vector of all values (PERF_FORMAT_GROUP), this avoids having to
do that recombine.

Another option would be to view the arrival of an OA vector in the
datastream as an 'event' and generate a PERF_RECORD_READ in the perf
buffer (which again can use the GROUP vector format).

>     Related to counter orthogonality; we can't time share the OA unit,
>     while event scheduling is a central design idea within perf for
>     allowing userspace to open + enable more events than can be
>     configured in HW at any one time.

So we have other PMUs that cannot do this; Gen OA would not be unique in
this. Intel PT for example only allows a single active event.

That said; earlier today I saw:

  https://www.youtube.com/watch?v=9J3BQcAeHpI&list=PLe6I3NKr-I4J2oLGXhGOeBMEjh8h10jT3&index=7

where exactly this feature was mentioned as not fitting well into the
existing GPU performance interfaces (GL_AMD_performance_monitor /
GL_INTEL_performance_query).

So there is hardware (Nvidia) out there that does support this. Also
mentioned was that this hardware has global and local counters, where
the local ones are specific to a rendering context. That is not unlike
the per-cpu / per-task stuff perf does.

>     The OA unit is not designed to
>     allow re-configuration while in use. We can't reconfigure the OA
>     unit without loosing internal OA unit state which we can't access
>     explicitly to save and restore. Reconfiguring the OA unit is also
>     relatively slow, involving ~100 register writes. From userspace Mesa
>     also depends on a stable OA configuration when emitting
>     MI_REPORT_PERF_COUNT commands and importantly the OA unit can't be
>     disabled while there are outstanding MI_RPC commands lest we hang
>     the command streamer.

Right; see the PERF_PMU_CAP_EXCLUSIVE stuff.

> - We may be making some technical compromises a.t.m for the sake of
>   using perf.
> 
>     perf_event_open() requires events to either relate to a pid or a
>     specific cpu core, while our device pmu relates to neither.  Events
>     opened with a pid will be automatically enabled/disabled according
>     to the scheduling of that process - so not appropriate for us.

Right; the traditional cpu/pid mapping doesn't work well for devices;
but maybe, with some work, we can create something like that
global/local render context from it; although I've no clue what form
that would need at this time.

>     When
>     an event is related to a cpu id, perf ensures pmu methods will be
>     invoked via an inter process interrupt on that core. To avoid
>     invasive changes our userspace opens OA perf events for a specific
>     cpu. 

Some of that might still make sense in the sense that GPUs are subject
to the NUMA topology of machines. I would think you would want most
such things to be done on the node the device is attached to.

Granted, this might not be a concern for Intel graphics, but it might be
relevant for some of the discrete GPUs.

> - I'm not confident our use case benefits much from building on perf:
> 
>     We aren't using existing perf based tooling with our PMU. Existing
>     tools typically assume you're profiling work running on a cpu, e.g.
>     expecting samples to be associated with instruction pointers and
>     user/kernel registers and aiming to represent metrics in relation
>     to application source code. We're forwarding fake register values
>     and userspace needs needs to know how to decode the raw OA reports
>     before anything can be reported to a user.
>     
>     With the buffering done by the OA unit I don't think we currently
>     benefit from perf's mmapped circular buffer interface. We already
>     have a decoupled producer and consumer and since we have to copy out
>     of the OA buffer, it would work well for us to hide that copy in
>     a simpler read() based interface.
> 
> 
> - Logistically it might be more practical to contain this to the
>   graphics stack.
> 
>     It seems fair to consider that if we can't see a very compelling
>     benefit to building on perf, then containing this work to
>     drivers/gpu/drm/i915 may simplify the review process as well as
>     future maintenance and development.

> Peter; I wonder if you would tend to agree too that it could make sense
> for us to go with our own interface here?

Sorry this took so long; this wanted a well considered response and
those tend to get delayed in light of 'urgent' stuff.

While I can certainly see the pain points and why you would rather not
deal with them. I think it would make Linux a better place if we could
manage to come up with a generic interface that would work for 'all'
GPUs (and possibly more devices).
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
  2015-10-16  9:43 ` Peter Zijlstra
@ 2015-10-16 10:02   ` Ingo Molnar
  2015-10-16 10:33     ` Peter Zijlstra
  2015-10-20 20:16   ` Robert Bragg
  1 sibling, 1 reply; 18+ messages in thread
From: Ingo Molnar @ 2015-10-16 10:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Matt Fleming, dri-devel, David Airlie, intel-gfx,
	linux-kernel, Alexander Shishkin, Sourab Gupta, linux-api,
	Zheng Yan, Daniel Vetter


* Peter Zijlstra <peterz@infradead.org> wrote:

> > - We may be making some technical compromises a.t.m for the sake of
> >   using perf.
> > 
> >     perf_event_open() requires events to either relate to a pid or a
> >     specific cpu core, while our device pmu relates to neither.  Events
> >     opened with a pid will be automatically enabled/disabled according
> >     to the scheduling of that process - so not appropriate for us.
> 
> Right; the traditional cpu/pid mapping doesn't work well for devices;
> but maybe, with some work, we can create something like that
> global/local render context from it; although I've no clue what form
> that would need at this time.

Could someone please help with some very basic questions, such as what the 
hardware model of the 'OA' unit model is? How are OA registers set up, how are 
their values made accessible to the host side, etc.

I see some references to 'OA' registers in:

  https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol03-gpu_overview_1.pdf

and I tried to find a more high level description in:

   https://01.org/linuxgraphics/documentation/hardware-specification-prms/2014-2015-intel-processors-based-broadwell-platform

but couldn't find it. (Maybe it's my fault!)

Thanks,

	Ingo
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
  2015-10-16 10:02   ` Ingo Molnar
@ 2015-10-16 10:33     ` Peter Zijlstra
       [not found]       ` <20151016103345.GS3816-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
  0 siblings, 1 reply; 18+ messages in thread
From: Peter Zijlstra @ 2015-10-16 10:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Mark Rutland, Matt Fleming, dri-devel, Kan Liang, intel-gfx,
	linux-kernel, Alexander Shishkin, Sourab Gupta, linux-api,
	Zheng Yan, Daniel Vetter, Robert Bragg

On Fri, Oct 16, 2015 at 12:02:28PM +0200, Ingo Molnar wrote:
> 
> * Peter Zijlstra <peterz@infradead.org> wrote:
> 
> > > - We may be making some technical compromises a.t.m for the sake of
> > >   using perf.
> > > 
> > >     perf_event_open() requires events to either relate to a pid or a
> > >     specific cpu core, while our device pmu relates to neither.  Events
> > >     opened with a pid will be automatically enabled/disabled according
> > >     to the scheduling of that process - so not appropriate for us.
> > 
> > Right; the traditional cpu/pid mapping doesn't work well for devices;
> > but maybe, with some work, we can create something like that
> > global/local render context from it; although I've no clue what form
> > that would need at this time.
> 
> Could someone please help with some very basic questions, such as what the 
> hardware model of the 'OA' unit model is? How are OA registers set up, how are 
> their values made accessible to the host side, etc.

Robert linked to:

  https://01.org/sites/default/files/documentation/observability_performance_counters_haswell.pdf

In a previous posting. It has some info, but full documentation, is as
per the initial post, 'pending'.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
       [not found]       ` <20151016103345.GS3816-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
@ 2015-10-16 12:08         ` Robert Bragg
  0 siblings, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-10-16 12:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, intel-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Daniel Vetter, Chris Wilson, Sourab Gupta, Zhenyu Wang,
	Jani Nikula, David Airlie, Kan Liang, Alexander Shishkin,
	Zheng Yan, Mark Rutland, Matt Fleming,
	dri-devel-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-api-u79uwXL29TY76Z2rM5mHXA

On Fri, Oct 16, 2015 at 11:33 AM, Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
> On Fri, Oct 16, 2015 at 12:02:28PM +0200, Ingo Molnar wrote:
>>
>> * Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org> wrote:
>>
>> > > - We may be making some technical compromises a.t.m for the sake of
>> > >   using perf.
>> > >
>> > >     perf_event_open() requires events to either relate to a pid or a
>> > >     specific cpu core, while our device pmu relates to neither.  Events
>> > >     opened with a pid will be automatically enabled/disabled according
>> > >     to the scheduling of that process - so not appropriate for us.
>> >
>> > Right; the traditional cpu/pid mapping doesn't work well for devices;
>> > but maybe, with some work, we can create something like that
>> > global/local render context from it; although I've no clue what form
>> > that would need at this time.
>>
>> Could someone please help with some very basic questions, such as what the
>> hardware model of the 'OA' unit model is? How are OA registers set up, how are
>> their values made accessible to the host side, etc.
>
> Robert linked to:
>
>   https://01.org/sites/default/files/documentation/observability_performance_counters_haswell.pdf
>
> In a previous posting. It has some info, but full documentation, is as
> per the initial post, 'pending'.

There is now also some Broadwell documentation here:

https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-bdw-vol14-observability.pdf

Unfortunately though a mistake was made by the documentation team when
generating the PDF which unintentionally stripped out a lot of
information so it's not very helpful a.t.m. I've let them know about
some of the issues, but I'm not sure a.t.m when it may be updated.

I tried to fill in the gaps in some of our earlier conversations, so
maybe also go over those for more details too.

Otherwise the best reference is probably my code currently, either the
RFC patches I sent most recently which at least cover up to Haswell,
or the wip/rib/oa-next branch here: https://github.com/rib/linux. The
lastest perf based driver is currently in the archive/rib/oa-core-perf
branch for reference too.

- Robert

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [RFC 0/6] Non perf based Gen Graphics OA unit driver
  2015-10-16  9:43 ` Peter Zijlstra
  2015-10-16 10:02   ` Ingo Molnar
@ 2015-10-20 20:16   ` Robert Bragg
  1 sibling, 0 replies; 18+ messages in thread
From: Robert Bragg @ 2015-10-20 20:16 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Matt Fleming, dri-devel, David Airlie, intel-gfx,
	linux-kernel, Alexander Shishkin, Sourab Gupta, linux-api,
	Zheng Yan, Daniel Vetter, Ingo Molnar

On Fri, Oct 16, 2015 at 10:43 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Sep 29, 2015 at 03:39:03PM +0100, Robert Bragg wrote:
>> - We're bridging two complex architectures
>>
>>     To review this work I think it will be relevant to have a good
>>     general familiarity with Gen graphics (e.g. thinking about the OA
>>     unit's interaction with the command streamer and execlist
>>     scheduling) as well as our userspace architecture and how we're
>>     consuming OA data within Mesa to implement the
>>     INTEL_performance_query extension.
>>
>>     On the flip side here, its necessary to understand the perf
>>     userspace interface (for most this is hidden by tools so the details
>>     aren't common knowledge) as well as the internal design, considering
>>     that the PMU we're looking at seems to break several current design
>>     assumptions. I can only claim a limited familiarity with perf's
>>     design, just as a result of this work.
>
> Right; but a little effort and patience on both sides should get us
> there I think. At worst we'll both learn something new ;-)

I suppose I'm also concerned time is an important factor too. When it
comes to the OA metrics; we already have userspace tools that could be
more widely used by developers once we have an upstream interface.
Today perf isn't very well suited to our OA unit use case, and
although we may be able to change that - and I can try to help with
that - at this point I think I'd prefer not to block moving forward in
the mean time with the alternative i915 interface.

Although code-wise it didn't require any big changes to events/core to
get an initial perf based driver working for our use case, we have
raised a number of quite significant design questions and arguably cut
some corners, which could take a long time to resolve properly. I also
tend to think it's an open question at this stage whether it would
really be in everyone's interest to take perf in this direction
without a clear sense of the benefits it brings in comparison to the
complexity it may add.

It's also a bit awkward I had already started to move ahead with this
idea of upstreaming a non-perf based driver for the OA unit after
asking Daniel Vetter about this on IRC. There are some knock on
effects here too; Sourab Gupta is looking at building on this OA
driver and has now started adapting his work for this non-perf
approach.

>
>> - The current OA PMU driver breaks some significant design assumptions.
>>
>>     Existing perf pmus are used for profiling work on a cpu and we're
>>     introducing the idea of _IS_DEVICE pmus with different security
>>     implications, the need to fake cpu-related data (such as user/kernel
>>     registers) to fit with perf's current design, and adding _DEVICE
>>     records as a way to forward device-specific status records.
>
> There are more devices with counters on than GPUs, so I think it might
> make sense to look at extending perf to better deal with this.

I wonder if it could be good to look at exposing some of the mmio
accessible Gen graphics counters before tackling a more complex case
like the OA unit. We have a number of counters that could be
interesting to sample periodically via a hrtimer, that require no
configuration, are global (so no need to specify a gpu context) but as
they relate to the GPU an _IS_DEVICE pmu would still be appropriate.
Some of these seem like they could be better suited to being exposed
via perf than OA unit counters so they might be a helpful stepping
stone.

>
>>     The OA unit writes reports of counters into a circular buffer,
>>     without involvement from the CPU, making our PMU driver the first of
>>     a kind.
>
> Agreed, this is somewhat 'odd' from where we are today.
>
>>     Perf supports groups of counters and allows those to be read via
>>     transactions internally but transactions currently seem designed to
>>     be explicitly initiated from the cpu (say in response to a userspace
>>     read()) and while we could pull a report out of the OA buffer we
>>     can't trigger a report from the cpu on demand.
>>
>>     Related to being report based; the OA counters are configured in HW
>>     as a set while perf generally expects counter configurations to be
>>     orthogonal. Although counters can be associated with a group leader
>>     as they are opened, there's no clear precedent for being able to
>>     provide group-wide configuration attributes and no obvious solution
>>     as yet that's expected to be acceptable to upstream and meets our
>>     userspace needs.
>
> I'm not entirely sure what you mean with group-wide configuration
> attributes; could you elaborate?

Here I'm thinking of configuration details that conceptually relate to
a set of OA unit counters, not individual events/counters:

- The choice of 'metric set' which represents a MUX configuration +
boolean logic configuration for a set of counters that will be
included in the reports written by the OA unit.

- The OA unit exponent for periodic sampling applies to the whole group.

- The choice of report layout which the OA unit writes all the counters in.

- The choice to profile a single context or system-wide applies to the
group, as well as the specification of a file descriptor + context ID
in the single-context case.


>
>>     We currently avoid using perf's grouping feature
>>     and forward OA reports to userspace via perf's 'raw' sample field.
>>     This suits our userspace well considering how coupled the counters
>>     are when dealing with normalizing. It would be inconvenient to split
>>     counters up into separate events, only to require userspace to
>>     recombine them.
>
> So IF you were using a group, a single read from the leader can return
> you a vector of all values (PERF_FORMAT_GROUP), this avoids having to
> do that recombine.

Although recombining isn't necessary with _FORMAT_GROUP, this vector
layout could be similarly inconvenient for userspace...

afict we couldn't avoid also requesting the _ID to be included in the
vector, so instead of the 32/40bits per counter we would now have
16bytes per counter which would seem to loose some benefit from a
compact HW layout to minimize our use of memory bandwidth.

Userspace has to be aware that we're placing 32bit or 40bit values
within a 64bit field and the values will overflow as such.

As userspace receives reports it calculates deltas between sequential
reports to accumulate 64bit counters. For each event in the vector it
needs to lookup the index of the accumulated value, and a lookup based
on a u64 event id for each counter isn't as direct as with a rigid
report layout. (I don't know what assumptions can be made about the
vector ordering from one sample to the next, but maybe a fixed mapping
could be made after the first sample) When calculating the delta
userspace needs to be careful not to treat the vector values as 64bit
but as 32/40bit to account for overflow.

Mesa will still have to be able to handle the raw OA report layouts to
process reports collected via the command stream using
MI_REPORT_PERF_COUNT commands, so any alternative at least represents
some amount of extra layout handling code and needing to handle
different combinations of raw or vector layouts when calculating a
single delta.

>
> Another option would be to view the arrival of an OA vector in the
> datastream as an 'event' and generate a PERF_RECORD_READ in the perf
> buffer (which again can use the GROUP vector format).

Tbh I couldn't really figure out what PERF_RECORD_READ is intended for
- the userspace code I found currently ignores these.

The extensible sample design seems more appropriate, but it could be
good if samples were extensible by device drivers as an alternative to
packing data into the raw field. Sourab who's been building on my base
OA driver is exposing more data as part of samples, including a
context ID, and we were extending what we included in the raw field
(we wouldn't get this extra data via _RECORD_READ).

Conceptually the aim was to expose event specific sample flags, to
extend samples (mirroring the existing design for pre-defined sample
flags, except driver extensible). We just used the raw field as the
most convenient extension point to start with.

Extending the raw field has some difficulties though because
events/core currently only lets a pmu give a single raw data pointer
plus len which will be copied into the ring buffer, so to include more
than the OA report we'd have to copy the report into an intermediate
larger buffer. I'd been considering allowing a vector of data+len
values to be specified for copying the raw data.

>
>>     Related to counter orthogonality; we can't time share the OA unit,
>>     while event scheduling is a central design idea within perf for
>>     allowing userspace to open + enable more events than can be
>>     configured in HW at any one time.
>
> So we have other PMUs that cannot do this; Gen OA would not be unique in
> this. Intel PT for example only allows a single active event.
>
> That said; earlier today I saw:
>
>   https://www.youtube.com/watch?v=9J3BQcAeHpI&list=PLe6I3NKr-I4J2oLGXhGOeBMEjh8h10jT3&index=7
>
> where exactly this feature was mentioned as not fitting well into the
> existing GPU performance interfaces (GL_AMD_performance_monitor /
> GL_INTEL_performance_query).

I think Samuel was generally commenting that these Intel/AMD GL
extensions aren't a good fit for Nvidia hw since it's awkward to
abstract the need to update the MUX configuration in-flight so they
can derive higher level counters from more inputs than the HW can
expose at the same time. Unlike Intel and AMD, it looks like instead
of defining a GL extension for accessing performance counters Nvidia
has a perfkit api which can be used in conjuction with OpenGL, CUDA or
Direct3D which Samuel has also been implementing.

I'm not sure what exactly about the AMD/Intel GL extensions makes it
tricky to abstract round-robin updates of the MUXs internally vs what
perfkit offers but since we wouldn't expect to round-robin the OA
counter configuration the INTEL_performance_query spec authors
wouldn't have given that any consideration.

>
> So there is hardware (Nvidia) out there that does support this. Also
> mentioned was that this hardware has global and local counters, where
> the local ones are specific to a rendering context. That is not unlike
> the per-cpu / per-task stuff perf does.

For reference, OA counters can be viewed as global or local counters.
When opening an event an application can request a system-wide view vs
single-context view.

>
>>     The OA unit is not designed to
>>     allow re-configuration while in use. We can't reconfigure the OA
>>     unit without loosing internal OA unit state which we can't access
>>     explicitly to save and restore. Reconfiguring the OA unit is also
>>     relatively slow, involving ~100 register writes. From userspace Mesa
>>     also depends on a stable OA configuration when emitting
>>     MI_REPORT_PERF_COUNT commands and importantly the OA unit can't be
>>     disabled while there are outstanding MI_RPC commands lest we hang
>>     the command streamer.
>
> Right; see the PERF_PMU_CAP_EXCLUSIVE stuff.

This looks like it's incompatible with the group mechanism re: the
other question of exposing OA reports via the grouping mechanims vs
raw sample data.

>
>> - We may be making some technical compromises a.t.m for the sake of
>>   using perf.
>>
>>     perf_event_open() requires events to either relate to a pid or a
>>     specific cpu core, while our device pmu relates to neither.  Events
>>     opened with a pid will be automatically enabled/disabled according
>>     to the scheduling of that process - so not appropriate for us.
>
> Right; the traditional cpu/pid mapping doesn't work well for devices;
> but maybe, with some work, we can create something like that
> global/local render context from it; although I've no clue what form
> that would need at this time.

Currently the way we identify a context is with the combination of a
file descriptor and a u32 context handle (only unique to that fd). The
use of a drm file descriptor here also relates to the security model
for accessing single context metrics, in that we don't require root
privileges to profile a context associated with a file descriptor that
the process has open.

Recently we've also been working with a globally unique context ID
(not upstream yet) and it could be interesting to also allow single
context profiling given a global ID, without any fd (comparable to
passing a pid to perf_event_open). This form of selection would
probably require root privileges by default.

>
>>     When
>>     an event is related to a cpu id, perf ensures pmu methods will be
>>     invoked via an inter process interrupt on that core. To avoid
>>     invasive changes our userspace opens OA perf events for a specific
>>     cpu.
>
> Some of that might still make sense in the sense that GPUs are subject
> to the NUMA topology of machines. I would think you would want most
> such things to be done on the node the device is attached to.
>
> Granted, this might not be a concern for Intel graphics, but it might be
> relevant for some of the discrete GPUs.

I'm not sure whether Nouveau takes this kind of idea into consideration or not.

>
>> - I'm not confident our use case benefits much from building on perf:
>>
>>     We aren't using existing perf based tooling with our PMU. Existing
>>     tools typically assume you're profiling work running on a cpu, e.g.
>>     expecting samples to be associated with instruction pointers and
>>     user/kernel registers and aiming to represent metrics in relation
>>     to application source code. We're forwarding fake register values
>>     and userspace needs needs to know how to decode the raw OA reports
>>     before anything can be reported to a user.
>>
>>     With the buffering done by the OA unit I don't think we currently
>>     benefit from perf's mmapped circular buffer interface. We already
>>     have a decoupled producer and consumer and since we have to copy out
>>     of the OA buffer, it would work well for us to hide that copy in
>>     a simpler read() based interface.
>>
>>
>> - Logistically it might be more practical to contain this to the
>>   graphics stack.
>>
>>     It seems fair to consider that if we can't see a very compelling
>>     benefit to building on perf, then containing this work to
>>     drivers/gpu/drm/i915 may simplify the review process as well as
>>     future maintenance and development.
>
>> Peter; I wonder if you would tend to agree too that it could make sense
>> for us to go with our own interface here?
>
> Sorry this took so long; this wanted a well considered response and
> those tend to get delayed in light of 'urgent' stuff.
>
> While I can certainly see the pain points and why you would rather not
> deal with them. I think it would make Linux a better place if we could
> manage to come up with a generic interface that would work for 'all'
> GPUs (and possibly more devices).

I'm not sure a generic multi-vendor interface for gpu metrics is
what's at stake or really needed here. The perf driver I worked on
didn't attempt to provide a generic interface: E.g. counter
normalizing is relatively complex and HW specific but makes sense to
leave to userspace. Normalizing may combine periodic OA reports (from
perf) with MI_REPORT_PERF_COUNT reports got via a command stream  (not
involving perf). Normalizing might also be done offline or remotely
for a reduced runtime overhead).

Although it could be good to work more closely with Samuel on this, I
did raise the idea of using perf with him last year but I think his
priority is still with accessing metrics synchronized with the command
stream. We were looking at a perf-like interface to help support the
periodic sampling feature of the OA unit, but I'm not aware that
Nvidia HW has the same kind of feature.

I think it's more the norm that GPU drivers don't share a lot of
kernel interfaces given the significant architectural differences
between vendors. A large proportion of a GPU driver is typically in
userspace and interface standardisation is done at the OpenGL/CL level
more so than with kernel interfaces. While perf may be adaptable to
help access some gpu/device metrics it's not in a good position to be
involved with capturing metrics via command streams in sync with
specific commands submitted in userspace so I wouldn't expect to aim
for perf to be the sole interface for gpu metrics.


Okey, sorry to still be inclined to prioritize the non-perf interface
at this stage, at least for OA unit metrics. That said; I think it may
still be practical to look at exposing other mmio accessible counters
of Gen graphics via perf and maybe these could serve as a stepping
stone before attempting supporting the OA unit via perf.

I'm still open to trying to help with adapting perf in this direction
if you feel it could be worthwhile, but would like to decouple the
effort for now.

Regards,
- Robert
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-10-20 20:16 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-29 14:39 [RFC 0/6] Non perf based Gen Graphics OA unit driver Robert Bragg
2015-09-29 14:39 ` [RFC 1/6] drm/i915: Add i915 perf infrastructure Robert Bragg
2015-09-29 14:39 ` [RFC 2/6] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
2015-09-29 14:39 ` [RFC 3/6] drm/i915: Add static '3D' Haswell OA unit config Robert Bragg
2015-09-29 14:39 ` [RFC 4/6] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
2015-09-29 14:55   ` [Intel-gfx] " kbuild test robot
2015-09-29 15:18     ` Peter Zijlstra
2015-09-29 23:19       ` [kbuild-all] " Fengguang Wu
2015-09-29 14:39 ` [RFC 6/6] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
2015-09-30  3:23 ` [RFC 0/6] Non perf based Gen Graphics OA unit driver Zhenyu Wang
     [not found] ` <1443537549-6905-1-git-send-email-robert-St23OQVBDYPNLxjTenLetw@public.gmane.org>
2015-09-29 14:39   ` [RFC 5/6] drm/i915: Add dev.i915.perf_event_paranoid sysctl option Robert Bragg
2015-09-30  8:30   ` [RFC 0/6] Non perf based Gen Graphics OA unit driver Chris Wilson
2015-09-30 13:36     ` Robert Bragg
2015-10-16  9:43 ` Peter Zijlstra
2015-10-16 10:02   ` Ingo Molnar
2015-10-16 10:33     ` Peter Zijlstra
     [not found]       ` <20151016103345.GS3816-ndre7Fmf5hadTX5a5knrm8zTDFooKrT+cvkQGrU6aU0@public.gmane.org>
2015-10-16 12:08         ` Robert Bragg
2015-10-20 20:16   ` Robert Bragg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).