All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/8] Enable Gen 7 Observation Architecture
@ 2016-02-03 18:39 Robert Bragg
  2016-02-03 18:39 ` [PATCH 1/8] drm/i915: Add i915 perf infrastructure Robert Bragg
                   ` (8 more replies)
  0 siblings, 9 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Sourab Gupta, Daniel Vetter, Robert Bragg

Compared to the last revision, this is just rebased on drm-intel-nightly

Robert Bragg (8):
  drm/i915: Add i915 perf infrastructure
  drm/i915: rename OACONTROL GEN7_OACONTROL
  drm/i915: Add 'render basic' Haswell OA unit config
  drm/i915: Add i915 perf event for Haswell OA unit
  drm/i915: advertise available metrics via sysfs
  drm/i915: Add dev.i915.perf_event_paranoid sysctl option
  drm/i915: add oa_event_min_timer_exponent sysctl
  drm/i915: Add more Haswell OA metric sets

 drivers/gpu/drm/i915/Makefile           |    4 +
 drivers/gpu/drm/i915/i915_cmd_parser.c  |    4 +-
 drivers/gpu/drm/i915/i915_dma.c         |    7 +
 drivers/gpu/drm/i915/i915_drv.h         |  154 ++++
 drivers/gpu/drm/i915/i915_gem_context.c |   23 +-
 drivers/gpu/drm/i915/i915_oa_hsw.c      |  658 ++++++++++++++++
 drivers/gpu/drm/i915/i915_oa_hsw.h      |   38 +
 drivers/gpu/drm/i915/i915_perf.c        | 1283 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h         |  340 +++++++-
 include/uapi/drm/i915_drm.h             |  121 +++
 10 files changed, 2625 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

-- 
2.7.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/8] drm/i915: Add i915 perf infrastructure
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-04  1:42   ` Emil Velikov
  2016-02-03 18:39 ` [PATCH 2/8] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Sourab Gupta, Daniel Vetter, Robert Bragg

Adds base i915 perf infrastructure for Gen performance metrics.

This adds a DRM_IOCTL_I915_PERF_OPEN ioctl that takes an array of uint64
properties to configure a stream of metrics and returns a new fd usable
with standard VFS system calls including read() to read typed and sized
records; ioctl() to enable or disable capture and poll() to wait for
data.

A stream is opened something like:

  uint64_t properties[] = {
      /* Single context sampling */
      DRM_I915_PERF_CTX_HANDLE_PROP,        ctx_handle,

      /* Include OA reports in samples */
      DRM_I915_PERF_SAMPLE_OA_PROP,         true,

      /* OA unit configuration */
      DRM_I915_PERF_OA_METRICS_SET_PROP,    metrics_set_id,
      DRM_I915_PERF_OA_FORMAT_PROP,         report_format,
      DRM_I915_PERF_OA_EXPONENT_PROP,       period_exponent,
   };
   struct drm_i915_perf_open_param parm = {
      .flags = I915_PERF_FLAG_FD_CLOEXEC |
               I915_PERF_FLAG_FD_NONBLOCK |
               I915_PERF_FLAG_DISABLED,
      .properties = properties,
      .n_properties = sizeof(properties) / 16,
   };
   int fd = drmIoctl(drm_fd, DRM_IOCTL_I915_PERF_OPEN, &param);

Records read all start with a common { type, size } header with
DRM_I915_PERF_RECORD_SAMPLE being of most interest. Sample records
contain an extensible number of fields and it's the
DRM_I915_PERF_SAMPLE_xyz properties given when opening that determine
what's included in every sample.

No specific streams are supported yet so any attempt to open a stream
will return an error.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/Makefile    |   3 +
 drivers/gpu/drm/i915/i915_dma.c  |   7 +
 drivers/gpu/drm/i915/i915_drv.h  |  88 ++++++++
 drivers/gpu/drm/i915/i915_perf.c | 442 +++++++++++++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h      |  69 ++++++
 5 files changed, 609 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 0851de07..af8ecbb 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -94,6 +94,9 @@ i915-y += dvo_ch7017.o \
 # virtual gpu code
 i915-y += i915_vgpu.o
 
+# perf code
+i915-y += i915_perf.o
+
 # legacy horrors
 i915-y += i915_dma.o
 
diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a42eb58..4f4c9ba 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1012,6 +1012,11 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 	if (ret < 0)
 		goto out_free_priv;
 
+	/* Must at least be initialized before trying to pin any context
+	 * which i915_perf hooks into.
+	 */
+	i915_perf_init(dev);
+
 	intel_pm_setup(dev);
 
 	intel_runtime_pm_get(dev_priv);
@@ -1203,6 +1208,7 @@ int i915_driver_unload(struct drm_device *dev)
 		return ret;
 	}
 
+	i915_perf_fini(dev);
 	intel_power_domains_fini(dev_priv);
 
 	intel_gpu_ips_teardown();
@@ -1379,6 +1385,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW),
 };
 
 int i915_max_ioctl = ARRAY_SIZE(i915_ioctls);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 65a2cd0..7748383 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1732,6 +1732,81 @@ struct intel_wm_config {
 	bool sprites_scaled;
 };
 
+struct i915_perf_read_state {
+	int count;
+	ssize_t read;
+	char __user *buf;
+};
+
+struct i915_perf_stream {
+	struct drm_i915_private *dev_priv;
+
+	struct list_head link;
+
+	u32 sample_flags;
+
+	struct intel_context *ctx;
+	bool enabled;
+
+	/* Enables the collection of HW samples, either in response to
+	 * I915_PERF_IOCTL_ENABLE or implicitly called when stream is
+	 * opened without I915_PERF_FLAG_DISABLED.
+	 */
+	void (*enable)(struct i915_perf_stream *stream);
+
+	/* Disables the collection of HW samples, either in response to
+	 * I915_PERF_IOCTL_DISABLE or implicitly called before
+	 * destroying the stream.
+	 */
+	void (*disable)(struct i915_perf_stream *stream);
+
+	/* Return: true if any i915 perf records are ready to read()
+	 * for this stream.
+	 */
+	bool (*can_read)(struct i915_perf_stream *stream);
+
+	/* Call poll_wait, passing a wait queue that will be woken
+	 * once there is something ready to read() for the stream
+	 */
+	void (*poll_wait)(struct i915_perf_stream *stream,
+			  struct file *file,
+			  poll_table *wait);
+
+	/* For handling a blocking read, wait until there is something
+	 * to ready to read() for the stream. E.g. wait on the same
+	 * wait queue that would be passed to poll_wait() until
+	 * ->can_read() returns true (if its safe to call ->can_read()
+	 * without the i915 perf lock held).
+	 */
+	int (*wait_unlocked)(struct i915_perf_stream *stream);
+
+	/* Copy as many buffered i915 perf samples and records for
+	 * this stream to userspace as will fit in the given buffer.
+	 *
+	 * Only write complete records.
+	 *
+	 * read_state->count is the length of read_state->buf
+	 *
+	 * Update read_state->read with the number of bytes written.
+	 *
+	 * Return: the number of records appended (may be zero if not
+	 * enough space), or errno.
+	 *
+	 * Only return an EFAULT from copying to the userspace buffer
+	 * if zero records have been appended, otherwise squash the
+	 * EFAULT error and report what records were successfully
+	 * copied.
+	 */
+	int (*read)(struct i915_perf_stream *stream,
+		    struct i915_perf_read_state *read_state);
+
+	/* Cleanup any stream specific resources.
+	 *
+	 * The stream will always be disabled before this is called.
+	 */
+	void (*destroy)(struct i915_perf_stream *stream);
+};
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *objects;
@@ -1983,6 +2058,12 @@ struct drm_i915_private {
 
 	struct i915_runtime_pm pm;
 
+	struct {
+		bool initialized;
+		struct mutex lock;
+		struct list_head streams;
+	} perf;
+
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
 	struct {
 		int (*execbuf_submit)(struct i915_execbuffer_params *params,
@@ -3251,6 +3332,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 				    struct drm_file *file_priv);
 
+int i915_perf_open_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file);
+
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
 					  struct i915_address_space *vm,
@@ -3366,6 +3450,10 @@ int i915_parse_cmds(struct intel_engine_cs *ring,
 		    u32 batch_len,
 		    bool is_master);
 
+/* i915_perf.c */
+extern void i915_perf_init(struct drm_device *dev);
+extern void i915_perf_fini(struct drm_device *dev);
+
 /* i915_suspend.c */
 extern int i915_save_state(struct drm_device *dev);
 extern int i915_restore_state(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
new file mode 100644
index 0000000..9af9e52
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -0,0 +1,442 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include <linux/anon_inodes.h>
+#include <linux/sizes.h>
+
+#include "i915_drv.h"
+
+struct perf_open_properties {
+	u32 sample_flags;
+
+	u64 single_context:1;
+	u64 ctx_handle;
+};
+
+static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
+				     struct file *file,
+				     char __user *buf,
+				     size_t count,
+				     loff_t *ppos)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	struct i915_perf_read_state state = { count, 0, buf };
+	int ret;
+
+	if (file->f_flags & O_NONBLOCK) {
+		if (!stream->can_read(stream))
+			return -EAGAIN;
+	} else {
+		mutex_unlock(&dev_priv->perf.lock);
+		ret = stream->wait_unlocked(stream);
+		mutex_lock(&dev_priv->perf.lock);
+
+		if (ret)
+			return ret;
+	}
+
+	ret = stream->read(stream, &state);
+	if (ret < 0) {
+		/* can relax this later if needs be, but this is the only
+		 * errno expected here atm...
+		 */
+		WARN_ON(ret != -EFAULT);
+		return ret;
+	}
+
+	if (ret == 0)
+		return -ENOSPC;
+
+	return state.read;
+}
+
+static ssize_t i915_perf_read(struct file *file,
+			      char __user *buf,
+			      size_t count,
+			      loff_t *ppos)
+{
+	struct i915_perf_stream *stream = file->private_data;
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	ssize_t ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_read_locked(stream, file, buf, count, ppos);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+static unsigned int i915_perf_poll_locked(struct i915_perf_stream *stream,
+					  struct file *file,
+					  poll_table *wait)
+{
+	unsigned int streams = 0;
+
+	stream->poll_wait(stream, file, wait);
+
+	if (stream->can_read(stream))
+		streams |= POLLIN;
+
+	return streams;
+}
+
+static unsigned int i915_perf_poll(struct file *file, poll_table *wait)
+{
+	struct i915_perf_stream *stream = file->private_data;
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	int ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_poll_locked(stream, file, wait);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+static void i915_perf_enable_locked(struct i915_perf_stream *stream)
+{
+	if (stream->enabled)
+		return;
+
+	/* Allow stream->enable() to refer to this */
+	stream->enabled = true;
+
+	if (stream->enable)
+		stream->enable(stream);
+}
+
+static void i915_perf_disable_locked(struct i915_perf_stream *stream)
+{
+	if (!stream->enabled)
+		return;
+
+	/* Allow stream->disable() to refer to this */
+	stream->enabled = false;
+
+	if (stream->disable)
+		stream->disable(stream);
+}
+
+static long i915_perf_ioctl_locked(struct i915_perf_stream *stream,
+				   unsigned int cmd,
+				   unsigned long arg)
+{
+	switch (cmd) {
+	case I915_PERF_IOCTL_ENABLE:
+		i915_perf_enable_locked(stream);
+		return 0;
+	case I915_PERF_IOCTL_DISABLE:
+		i915_perf_disable_locked(stream);
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static long i915_perf_ioctl(struct file *file,
+			    unsigned int cmd,
+			    unsigned long arg)
+{
+	struct i915_perf_stream *stream = file->private_data;
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	long ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_ioctl_locked(stream, cmd, arg);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+static void i915_perf_destroy_locked(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	if (stream->enabled)
+		i915_perf_disable_locked(stream);
+
+	if (stream->destroy)
+		stream->destroy(stream);
+
+	list_del(&stream->link);
+
+	if (stream->ctx) {
+		mutex_lock(&dev_priv->dev->struct_mutex);
+		i915_gem_context_unreference(stream->ctx);
+		mutex_unlock(&dev_priv->dev->struct_mutex);
+	}
+
+	kfree(stream);
+}
+
+static int i915_perf_release(struct inode *inode, struct file *file)
+{
+	struct i915_perf_stream *stream = file->private_data;
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	mutex_lock(&dev_priv->perf.lock);
+	i915_perf_destroy_locked(stream);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return 0;
+}
+
+
+static const struct file_operations fops = {
+	.owner		= THIS_MODULE,
+	.llseek		= no_llseek,
+	.release	= i915_perf_release,
+	.poll		= i915_perf_poll,
+	.read		= i915_perf_read,
+	.unlocked_ioctl	= i915_perf_ioctl,
+};
+
+static struct intel_context *
+lookup_context(struct drm_i915_private *dev_priv,
+	       struct file *user_filp,
+	       u32 ctx_user_handle)
+{
+	struct intel_context *ctx;
+
+	mutex_lock(&dev_priv->dev->struct_mutex);
+	list_for_each_entry(ctx, &dev_priv->context_list, link) {
+		struct drm_file *drm_file;
+
+		if (!ctx->file_priv)
+			continue;
+
+		drm_file = ctx->file_priv->file;
+
+		if (user_filp->private_data == drm_file &&
+		    ctx->user_handle == ctx_user_handle) {
+			i915_gem_context_reference(ctx);
+			mutex_unlock(&dev_priv->dev->struct_mutex);
+
+			return ctx;
+		}
+	}
+	mutex_unlock(&dev_priv->dev->struct_mutex);
+
+	return NULL;
+}
+
+int i915_perf_open_ioctl_locked(struct drm_device *dev,
+				struct drm_i915_perf_open_param *param,
+				struct perf_open_properties *props,
+				struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_context *specific_ctx = NULL;
+	struct i915_perf_stream *stream = NULL;
+	unsigned long f_flags = 0;
+	int stream_fd;
+	int ret = 0;
+
+	if (props->single_context) {
+		u32 ctx_handle = props->ctx_handle;
+
+		specific_ctx = lookup_context(dev_priv, file->filp, ctx_handle);
+		if (!specific_ctx) {
+			DRM_ERROR("Failed to look up context with ID %u for opening perf stream\n",
+				  ctx_handle);
+			ret = -EINVAL;
+			goto err;
+		}
+	}
+
+	if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+		DRM_ERROR("Insufficient privileges to open system-wide i915 perf stream\n");
+		ret = -EACCES;
+		goto err_ctx;
+	}
+
+	stream = kzalloc(sizeof(*stream), GFP_KERNEL);
+	if (!stream) {
+		ret = -ENOMEM;
+		goto err_ctx;
+	}
+
+	stream->sample_flags = props->sample_flags;
+	stream->dev_priv = dev_priv;
+	stream->ctx = specific_ctx;
+
+	/*
+	 * TODO: support sampling something
+	 *
+	 * For now this is as far as we can go.
+	 */
+	DRM_ERROR("Unsupported i915 perf stream configuration\n");
+	ret = -EINVAL;
+	goto err_alloc;
+
+	list_add(&stream->link, &dev_priv->perf.streams);
+
+	if (param->flags & I915_PERF_FLAG_FD_CLOEXEC)
+		f_flags |= O_CLOEXEC;
+	if (param->flags & I915_PERF_FLAG_FD_NONBLOCK)
+		f_flags |= O_NONBLOCK;
+
+	stream_fd = anon_inode_getfd("[i915_perf]", &fops, stream, f_flags);
+	if (stream_fd < 0) {
+		ret = stream_fd;
+		goto err_open;
+	}
+
+	if (!(param->flags & I915_PERF_FLAG_DISABLED))
+		i915_perf_enable_locked(stream);
+
+	return stream_fd;
+
+err_open:
+	list_del(&stream->link);
+	if (stream->destroy)
+		stream->destroy(stream);
+err_alloc:
+	kfree(stream);
+err_ctx:
+	if (specific_ctx) {
+		mutex_lock(&dev_priv->dev->struct_mutex);
+		i915_gem_context_unreference(specific_ctx);
+		mutex_unlock(&dev_priv->dev->struct_mutex);
+	}
+err:
+	return ret;
+}
+
+/* Note we copy the properties from userspace outside of the i915 perf
+ * mutex to avoid an awkward lockdep with mmap_sem.
+ *
+ * Note this function only validates properties in isolation it doesn't
+ * validate that the combination of properties makes sense or that all
+ * properties necessary for a particular kind of stream have been set.
+ */
+static int read_properties_unlocked(struct drm_i915_private *dev_priv,
+				    u64 __user *uprops,
+				    u32 n_props,
+				    struct perf_open_properties *props)
+{
+	u64 __user *uprop = uprops;
+	int i;
+
+	memset(props, 0, sizeof(struct perf_open_properties));
+
+	if (!n_props) {
+		DRM_ERROR("No i915 perf properties given");
+		return -EINVAL;
+	}
+
+	if (n_props > DRM_I915_PERF_PROP_MAX) {
+		DRM_ERROR("More i915 perf properties specified than exist");
+		return -EINVAL;
+	}
+
+	for (i = 0; i < n_props; i++) {
+		u64 id, value;
+		int ret;
+
+		ret = get_user(id, (u64 __user *)uprop);
+		if (ret)
+			return ret;
+
+		if (id == 0 || id >= DRM_I915_PERF_PROP_MAX) {
+			DRM_ERROR("Unknown i915 perf property ID");
+			return -EINVAL;
+		}
+
+		ret = get_user(value, (u64 __user *)uprop + 1);
+		if (ret)
+			return ret;
+
+		switch ((enum drm_i915_perf_property_id)id) {
+		case DRM_I915_PERF_CTX_HANDLE_PROP:
+			props->single_context = 1;
+			props->ctx_handle = value;
+			break;
+
+		case DRM_I915_PERF_PROP_MAX:
+			BUG();
+		}
+
+		uprop += 2;
+	}
+
+	return 0;
+}
+
+int i915_perf_open_ioctl(struct drm_device *dev, void *data,
+			 struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_perf_open_param *param = data;
+	struct perf_open_properties props;
+	u32 known_open_flags = 0;
+	int ret;
+
+	if (!dev_priv->perf.initialized) {
+		DRM_ERROR("i915 perf interface not available for this system");
+		return -ENOTSUPP;
+	}
+
+	known_open_flags = I915_PERF_FLAG_FD_CLOEXEC |
+			   I915_PERF_FLAG_FD_NONBLOCK |
+			   I915_PERF_FLAG_DISABLED;
+	if (param->flags & ~known_open_flags) {
+		DRM_ERROR("Unknown drm_i915_perf_open_param flag\n");
+		return -EINVAL;
+	}
+
+	ret = read_properties_unlocked(dev_priv,
+				       to_user_ptr(param->properties),
+				       param->n_properties,
+				       &props);
+	if (ret)
+		return ret;
+
+	mutex_lock(&dev_priv->perf.lock);
+	ret = i915_perf_open_ioctl_locked(dev, param, &props, file);
+	mutex_unlock(&dev_priv->perf.lock);
+
+	return ret;
+}
+
+void i915_perf_init(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+
+	INIT_LIST_HEAD(&dev_priv->perf.streams);
+	mutex_init(&dev_priv->perf.lock);
+
+	dev_priv->perf.initialized = true;
+}
+
+void i915_perf_fini(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+
+	if (!dev_priv->perf.initialized)
+		return;
+
+	/* Currently nothing to clean up */
+
+	dev_priv->perf.initialized = false;
+}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a5524cc..68ca26e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -230,6 +230,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_USERPTR		0x33
 #define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
 #define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
+#define DRM_I915_PERF_OPEN		0x36
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -283,6 +284,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_USERPTR			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
 #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
+#define DRM_IOCTL_I915_PERF_OPEN	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1170,4 +1172,71 @@ struct drm_i915_gem_context_param {
 	__u64 value;
 };
 
+#define I915_PERF_FLAG_FD_CLOEXEC	(1<<0)
+#define I915_PERF_FLAG_FD_NONBLOCK	(1<<1)
+#define I915_PERF_FLAG_DISABLED		(1<<2)
+
+enum drm_i915_perf_property_id {
+	/**
+	 * Open the stream for a specific context handle (as used with
+	 * execbuffer2). A stream opened for a specific context this way
+	 * won't typically require root privileges.
+	 */
+	DRM_I915_PERF_CTX_HANDLE_PROP = 1,
+
+	DRM_I915_PERF_PROP_MAX /* non-ABI */
+};
+
+struct drm_i915_perf_open_param {
+	/** CLOEXEC, NONBLOCK... */
+	__u32 flags;
+
+	/**
+	 * Pointer to array of u64 (id, value) pairs configuring the stream
+	 * to open.
+	 */
+	__u64 __user properties;
+
+	/** The number of u64 (id, value) pairs */
+	__u32 n_properties;
+};
+
+#define I915_PERF_IOCTL_ENABLE	_IO('i', 0x0)
+#define I915_PERF_IOCTL_DISABLE	_IO('i', 0x1)
+
+/**
+ * Common to all i915 perf records
+ */
+struct drm_i915_perf_record_header {
+	__u32 type;
+	__u16 pad;
+	__u16 size;
+};
+
+enum drm_i915_perf_record_type {
+
+	/**
+	 * Samples are the work horse record type whose contents are extensible
+	 * and defined when opening an i915 perf stream based on the given
+	 * properties.
+	 *
+	 * Boolean properties following the naming convention
+	 * DRM_I915_PERF_SAMPLE_xyz_PROP request the inclusion of 'xyz' data in
+	 * every sample.
+	 *
+	 * The order of these sample properties given by userspace has no
+	 * affect on the ordering of data within a sample. The order will be
+	 * documented here.
+	 *
+	 * struct {
+	 *     struct drm_i915_perf_record_header header;
+	 *
+	 *     TODO: itemize extensible sample data here
+	 * };
+	 */
+	DRM_I915_PERF_RECORD_SAMPLE = 1,
+
+	DRM_I915_PERF_RECORD_MAX /* non-ABI */
+};
+
 #endif /* _UAPI_I915_DRM_H_ */
-- 
2.7.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/8] drm/i915: rename OACONTROL GEN7_OACONTROL
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
  2016-02-03 18:39 ` [PATCH 1/8] drm/i915: Add i915 perf infrastructure Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-03 18:39 ` [PATCH 3/8] drm/i915: Add 'render basic' Haswell OA unit config Robert Bragg
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: David Airlie, dri-devel, Sourab Gupta, Daniel Vetter

OACONTROL changes quite a bit for gen8, with some bits split out into a
per-context OACTXCONTROL register. Rename now before add more gen7 OA
registers

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_cmd_parser.c | 4 ++--
 drivers/gpu/drm/i915/i915_reg.h        | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_cmd_parser.c b/drivers/gpu/drm/i915/i915_cmd_parser.c
index 814d894..920b75f 100644
--- a/drivers/gpu/drm/i915/i915_cmd_parser.c
+++ b/drivers/gpu/drm/i915/i915_cmd_parser.c
@@ -444,7 +444,7 @@ static const struct drm_i915_reg_descriptor gen7_render_regs[] = {
 	REG64(CL_PRIMITIVES_COUNT),
 	REG64(PS_INVOCATION_COUNT),
 	REG64(PS_DEPTH_COUNT),
-	REG32(OACONTROL), /* Only allowed for LRI and SRM. See below. */
+	REG32(GEN7_OACONTROL), /* Only allowed for LRI and SRM. See below. */
 	REG64(MI_PREDICATE_SRC0),
 	REG64(MI_PREDICATE_SRC1),
 	REG32(GEN7_3DPRIM_END_OFFSET),
@@ -1028,7 +1028,7 @@ static bool check_cmd(const struct intel_engine_cs *ring,
 			 * to the register. Hence, limit OACONTROL writes to
 			 * only MI_LOAD_REGISTER_IMM commands.
 			 */
-			if (reg_addr == i915_mmio_reg_offset(OACONTROL)) {
+			if (reg_addr == i915_mmio_reg_offset(GEN7_OACONTROL)) {
 				if (desc->cmd.value == MI_LOAD_REGISTER_MEM) {
 					DRM_DEBUG_DRIVER("CMD: Rejected LRM to OACONTROL\n");
 					return false;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c0bd691..119401c 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -586,7 +586,7 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define GEN7_GPGPU_DISPATCHDIMY         _MMIO(0x2504)
 #define GEN7_GPGPU_DISPATCHDIMZ         _MMIO(0x2508)
 
-#define OACONTROL _MMIO(0x2360)
+#define GEN7_OACONTROL _MMIO(0x2360)
 
 #define _GEN7_PIPEA_DE_LOAD_SL	0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL	0x71068
-- 
2.7.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/8] drm/i915: Add 'render basic' Haswell OA unit config
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
  2016-02-03 18:39 ` [PATCH 1/8] drm/i915: Add i915 perf infrastructure Robert Bragg
  2016-02-03 18:39 ` [PATCH 2/8] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-03 18:39 ` [PATCH 4/8] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Sourab Gupta, Daniel Vetter, Robert Bragg

Adds a static OA unit, MUX + B Counter configuration for basic render
metrics on Haswell. This is autogenerated from an internal XML
description of metric sets.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/Makefile      |   3 +-
 drivers/gpu/drm/i915/i915_drv.h    |  14 ++++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 132 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_oa_hsw.h |  34 ++++++++++
 4 files changed, 182 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index af8ecbb..a0131da 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -95,7 +95,8 @@ i915-y += dvo_ch7017.o \
 i915-y += i915_vgpu.o
 
 # perf code
-i915-y += i915_perf.o
+i915-y += i915_perf.o \
+	  i915_oa_hsw.o
 
 # legacy horrors
 i915-y += i915_dma.o
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7748383..86da0ae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1732,6 +1732,11 @@ struct intel_wm_config {
 	bool sprites_scaled;
 };
 
+struct i915_oa_reg {
+	u32 addr;
+	u32 value;
+};
+
 struct i915_perf_read_state {
 	int count;
 	ssize_t read;
@@ -2062,6 +2067,15 @@ struct drm_i915_private {
 		bool initialized;
 		struct mutex lock;
 		struct list_head streams;
+
+		struct {
+			u32 metrics_set;
+
+			const struct i915_oa_reg *mux_regs;
+			int mux_regs_len;
+			const struct i915_oa_reg *b_counter_regs;
+			int b_counter_regs_len;
+		} oa;
 	} perf;
 
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c
new file mode 100644
index 0000000..c05745e
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -0,0 +1,132 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#include "i915_drv.h"
+
+enum metric_set_id {
+        METRIC_SET_ID_RENDER_BASIC = 1,
+};
+
+int i915_oa_n_builtin_metric_sets_hsw = 1;
+
+static const struct i915_oa_reg b_counter_config_render_basic[] = {
+	{ 0x2724, 0x00800000 },
+	{ 0x2720, 0x00000000 },
+	{ 0x2714, 0x00800000 },
+	{ 0x2710, 0x00000000 },
+};
+
+static const struct i915_oa_reg mux_config_render_basic[] = {
+	{ 0x253A4, 0x01600000 },
+	{ 0x25440, 0x00100000 },
+	{ 0x25128, 0x00000000 },
+	{ 0x2691C, 0x00000800 },
+	{ 0x26AA0, 0x01500000 },
+	{ 0x26B9C, 0x00006000 },
+	{ 0x2791C, 0x00000800 },
+	{ 0x27AA0, 0x01500000 },
+	{ 0x27B9C, 0x00006000 },
+	{ 0x2641C, 0x00000400 },
+	{ 0x25380, 0x00000010 },
+	{ 0x2538C, 0x00000000 },
+	{ 0x25384, 0x0800AAAA },
+	{ 0x25400, 0x00000004 },
+	{ 0x2540C, 0x06029000 },
+	{ 0x25410, 0x00000002 },
+	{ 0x25404, 0x5C30FFFF },
+	{ 0x25100, 0x00000016 },
+	{ 0x25110, 0x00000400 },
+	{ 0x25104, 0x00000000 },
+	{ 0x26804, 0x00001211 },
+	{ 0x26884, 0x00000100 },
+	{ 0x26900, 0x00000002 },
+	{ 0x26908, 0x00700000 },
+	{ 0x26904, 0x00000000 },
+	{ 0x26984, 0x00001022 },
+	{ 0x26A04, 0x00000011 },
+	{ 0x26A80, 0x00000006 },
+	{ 0x26A88, 0x00000C02 },
+	{ 0x26A84, 0x00000000 },
+	{ 0x26B04, 0x00001000 },
+	{ 0x26B80, 0x00000002 },
+	{ 0x26B8C, 0x00000007 },
+	{ 0x26B84, 0x00000000 },
+	{ 0x27804, 0x00004844 },
+	{ 0x27884, 0x00000400 },
+	{ 0x27900, 0x00000002 },
+	{ 0x27908, 0x0E000000 },
+	{ 0x27904, 0x00000000 },
+	{ 0x27984, 0x00004088 },
+	{ 0x27A04, 0x00000044 },
+	{ 0x27A80, 0x00000006 },
+	{ 0x27A88, 0x00018040 },
+	{ 0x27A84, 0x00000000 },
+	{ 0x27B04, 0x00004000 },
+	{ 0x27B80, 0x00000002 },
+	{ 0x27B8C, 0x000000E0 },
+	{ 0x27B84, 0x00000000 },
+	{ 0x26104, 0x00002222 },
+	{ 0x26184, 0x0C006666 },
+	{ 0x26284, 0x04000000 },
+	{ 0x26304, 0x04000000 },
+	{ 0x26400, 0x00000002 },
+	{ 0x26410, 0x000000A0 },
+	{ 0x26404, 0x00000000 },
+	{ 0x25420, 0x04108020 },
+	{ 0x25424, 0x1284A420 },
+	{ 0x2541C, 0x00000000 },
+	{ 0x25428, 0x00042049 },
+};
+
+static int select_render_basic_config(struct drm_i915_private *dev_priv)
+{
+        dev_priv->perf.oa.mux_regs =
+                mux_config_render_basic;
+        dev_priv->perf.oa.mux_regs_len =
+                ARRAY_SIZE(mux_config_render_basic);
+
+        dev_priv->perf.oa.b_counter_regs =
+                b_counter_config_render_basic;
+        dev_priv->perf.oa.b_counter_regs_len =
+                ARRAY_SIZE(b_counter_config_render_basic);
+
+        return 0;
+}
+
+int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv)
+{
+        dev_priv->perf.oa.mux_regs = NULL;
+        dev_priv->perf.oa.mux_regs_len = 0;
+        dev_priv->perf.oa.b_counter_regs = NULL;
+        dev_priv->perf.oa.b_counter_regs_len = 0;
+
+        switch (dev_priv->perf.oa.metrics_set) {
+        case METRIC_SET_ID_RENDER_BASIC:
+                return select_render_basic_config(dev_priv);
+        default:
+                return -ENODEV;
+        }
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h
new file mode 100644
index 0000000..b618a1f
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.h
@@ -0,0 +1,34 @@
+/*
+ * Autogenerated file, DO NOT EDIT manually!
+ *
+ * Copyright (c) 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ */
+
+#ifndef __I915_OA_HSW_H__
+#define __I915_OA_HSW_H__
+
+extern int i915_oa_n_builtin_metric_sets_hsw;
+
+extern int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv);
+
+#endif
-- 
2.7.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/8] drm/i915: Add i915 perf event for Haswell OA unit
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
                   ` (2 preceding siblings ...)
  2016-02-03 18:39 ` [PATCH 3/8] drm/i915: Add 'render basic' Haswell OA unit config Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-03 18:39 ` [PATCH 5/8] drm/i915: advertise available metrics via sysfs Robert Bragg
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Sourab Gupta, Daniel Vetter, Robert Bragg

Gen graphics hardware can be set up to periodically write snapshots of
performance counters into a circular buffer via its Observation
Architecture and this patch exposes that capability to userspace via the
i915 perf interface.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Robert Bragg <robert@sixbynine.org>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  51 ++-
 drivers/gpu/drm/i915/i915_gem_context.c |  23 +-
 drivers/gpu/drm/i915/i915_oa_hsw.c      | 166 +++----
 drivers/gpu/drm/i915/i915_perf.c        | 783 +++++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_reg.h         | 338 ++++++++++++++
 include/uapi/drm/i915_drm.h             |  56 ++-
 6 files changed, 1317 insertions(+), 100 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 86da0ae..e4b9399 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1732,8 +1732,13 @@ struct intel_wm_config {
 	bool sprites_scaled;
 };
 
+struct i915_oa_format {
+	u32 format;
+	int size;
+};
+
 struct i915_oa_reg {
-	u32 addr;
+	i915_reg_t addr;
 	u32 value;
 };
 
@@ -1749,6 +1754,7 @@ struct i915_perf_stream {
 	struct list_head link;
 
 	u32 sample_flags;
+	int sample_size;
 
 	struct intel_context *ctx;
 	bool enabled;
@@ -1812,6 +1818,20 @@ struct i915_perf_stream {
 	void (*destroy)(struct i915_perf_stream *stream);
 };
 
+struct i915_oa_ops {
+	void (*init_oa_buffer)(struct drm_i915_private *dev_priv);
+	int (*enable_metric_set)(struct drm_i915_private *dev_priv);
+	void (*disable_metric_set)(struct drm_i915_private *dev_priv);
+	void (*oa_enable)(struct drm_i915_private *dev_priv);
+	void (*oa_disable)(struct drm_i915_private *dev_priv);
+	void (*update_oacontrol)(struct drm_i915_private *dev_priv);
+	void (*update_hw_ctx_id_locked)(struct drm_i915_private *dev_priv,
+					u32 ctx_id);
+	int (*read)(struct i915_perf_stream *stream,
+		    struct i915_perf_read_state *read_state);
+	bool (*oa_buffer_is_empty)(struct drm_i915_private *dev_priv);
+};
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *objects;
@@ -2065,16 +2085,43 @@ struct drm_i915_private {
 
 	struct {
 		bool initialized;
+
 		struct mutex lock;
 		struct list_head streams;
 
+		spinlock_t hook_lock;
+
 		struct {
+			struct i915_perf_stream *exclusive_stream;
+
+			u32 specific_ctx_id;
+
+			struct hrtimer poll_check_timer;
+			wait_queue_head_t poll_wq;
+
+			bool periodic;
+			u32 period_exponent;
+
 			u32 metrics_set;
 
 			const struct i915_oa_reg *mux_regs;
 			int mux_regs_len;
 			const struct i915_oa_reg *b_counter_regs;
 			int b_counter_regs_len;
+
+			struct {
+				struct drm_i915_gem_object *obj;
+				u32 gtt_offset;
+				u8 *addr;
+				u32 head;
+				u32 tail;
+				int format;
+				int format_size;
+			} oa_buffer;
+
+			struct i915_oa_ops ops;
+			const struct i915_oa_format *oa_formats;
+			int n_builtin_sets;
 		} oa;
 	} perf;
 
@@ -3348,6 +3395,8 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 
 int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file);
+void i915_oa_context_pin_notify(struct drm_i915_private *dev_priv,
+				struct intel_context *context);
 
 /* i915_gem_evict.c */
 int __must_check i915_gem_evict_something(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 83a097c..071cfa8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -133,6 +133,23 @@ static int get_context_size(struct drm_device *dev)
 	return ret;
 }
 
+static int i915_gem_context_pin_state(struct drm_device *dev,
+				      struct intel_context *ctx)
+{
+	int ret;
+
+	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
+
+	ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
+				    get_context_alignment(dev), 0);
+	if (ret)
+		return ret;
+
+	i915_oa_context_pin_notify(dev->dev_private, ctx);
+
+	return 0;
+}
+
 static void i915_gem_context_clean(struct intel_context *ctx)
 {
 	struct i915_hw_ppgtt *ppgtt = ctx->ppgtt;
@@ -287,8 +304,7 @@ i915_gem_create_context(struct drm_device *dev,
 		 * be available. To avoid this we always pin the default
 		 * context.
 		 */
-		ret = i915_gem_obj_ggtt_pin(ctx->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(dev), 0);
+		ret = i915_gem_context_pin_state(dev, ctx);
 		if (ret) {
 			DRM_DEBUG_DRIVER("Couldn't pin %d\n", ret);
 			goto err_destroy;
@@ -665,8 +681,7 @@ static int do_switch(struct drm_i915_gem_request *req)
 
 	/* Trying to pin first makes error handling easier. */
 	if (ring == &dev_priv->ring[RCS]) {
-		ret = i915_gem_obj_ggtt_pin(to->legacy_hw_ctx.rcs_state,
-					    get_context_alignment(ring->dev), 0);
+		ret = i915_gem_context_pin_state(ring->dev, to);
 		if (ret)
 			return ret;
 	}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c
index c05745e..5472aa0 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -27,106 +27,106 @@
 #include "i915_drv.h"
 
 enum metric_set_id {
-        METRIC_SET_ID_RENDER_BASIC = 1,
+	METRIC_SET_ID_RENDER_BASIC = 1,
 };
 
 int i915_oa_n_builtin_metric_sets_hsw = 1;
 
 static const struct i915_oa_reg b_counter_config_render_basic[] = {
-	{ 0x2724, 0x00800000 },
-	{ 0x2720, 0x00000000 },
-	{ 0x2714, 0x00800000 },
-	{ 0x2710, 0x00000000 },
+	{ _MMIO(0x2724), 0x00800000 },
+	{ _MMIO(0x2720), 0x00000000 },
+	{ _MMIO(0x2714), 0x00800000 },
+	{ _MMIO(0x2710), 0x00000000 },
 };
 
 static const struct i915_oa_reg mux_config_render_basic[] = {
-	{ 0x253A4, 0x01600000 },
-	{ 0x25440, 0x00100000 },
-	{ 0x25128, 0x00000000 },
-	{ 0x2691C, 0x00000800 },
-	{ 0x26AA0, 0x01500000 },
-	{ 0x26B9C, 0x00006000 },
-	{ 0x2791C, 0x00000800 },
-	{ 0x27AA0, 0x01500000 },
-	{ 0x27B9C, 0x00006000 },
-	{ 0x2641C, 0x00000400 },
-	{ 0x25380, 0x00000010 },
-	{ 0x2538C, 0x00000000 },
-	{ 0x25384, 0x0800AAAA },
-	{ 0x25400, 0x00000004 },
-	{ 0x2540C, 0x06029000 },
-	{ 0x25410, 0x00000002 },
-	{ 0x25404, 0x5C30FFFF },
-	{ 0x25100, 0x00000016 },
-	{ 0x25110, 0x00000400 },
-	{ 0x25104, 0x00000000 },
-	{ 0x26804, 0x00001211 },
-	{ 0x26884, 0x00000100 },
-	{ 0x26900, 0x00000002 },
-	{ 0x26908, 0x00700000 },
-	{ 0x26904, 0x00000000 },
-	{ 0x26984, 0x00001022 },
-	{ 0x26A04, 0x00000011 },
-	{ 0x26A80, 0x00000006 },
-	{ 0x26A88, 0x00000C02 },
-	{ 0x26A84, 0x00000000 },
-	{ 0x26B04, 0x00001000 },
-	{ 0x26B80, 0x00000002 },
-	{ 0x26B8C, 0x00000007 },
-	{ 0x26B84, 0x00000000 },
-	{ 0x27804, 0x00004844 },
-	{ 0x27884, 0x00000400 },
-	{ 0x27900, 0x00000002 },
-	{ 0x27908, 0x0E000000 },
-	{ 0x27904, 0x00000000 },
-	{ 0x27984, 0x00004088 },
-	{ 0x27A04, 0x00000044 },
-	{ 0x27A80, 0x00000006 },
-	{ 0x27A88, 0x00018040 },
-	{ 0x27A84, 0x00000000 },
-	{ 0x27B04, 0x00004000 },
-	{ 0x27B80, 0x00000002 },
-	{ 0x27B8C, 0x000000E0 },
-	{ 0x27B84, 0x00000000 },
-	{ 0x26104, 0x00002222 },
-	{ 0x26184, 0x0C006666 },
-	{ 0x26284, 0x04000000 },
-	{ 0x26304, 0x04000000 },
-	{ 0x26400, 0x00000002 },
-	{ 0x26410, 0x000000A0 },
-	{ 0x26404, 0x00000000 },
-	{ 0x25420, 0x04108020 },
-	{ 0x25424, 0x1284A420 },
-	{ 0x2541C, 0x00000000 },
-	{ 0x25428, 0x00042049 },
+	{ _MMIO(0x253A4), 0x01600000 },
+	{ _MMIO(0x25440), 0x00100000 },
+	{ _MMIO(0x25128), 0x00000000 },
+	{ _MMIO(0x2691C), 0x00000800 },
+	{ _MMIO(0x26AA0), 0x01500000 },
+	{ _MMIO(0x26B9C), 0x00006000 },
+	{ _MMIO(0x2791C), 0x00000800 },
+	{ _MMIO(0x27AA0), 0x01500000 },
+	{ _MMIO(0x27B9C), 0x00006000 },
+	{ _MMIO(0x2641C), 0x00000400 },
+	{ _MMIO(0x25380), 0x00000010 },
+	{ _MMIO(0x2538C), 0x00000000 },
+	{ _MMIO(0x25384), 0x0800AAAA },
+	{ _MMIO(0x25400), 0x00000004 },
+	{ _MMIO(0x2540C), 0x06029000 },
+	{ _MMIO(0x25410), 0x00000002 },
+	{ _MMIO(0x25404), 0x5C30FFFF },
+	{ _MMIO(0x25100), 0x00000016 },
+	{ _MMIO(0x25110), 0x00000400 },
+	{ _MMIO(0x25104), 0x00000000 },
+	{ _MMIO(0x26804), 0x00001211 },
+	{ _MMIO(0x26884), 0x00000100 },
+	{ _MMIO(0x26900), 0x00000002 },
+	{ _MMIO(0x26908), 0x00700000 },
+	{ _MMIO(0x26904), 0x00000000 },
+	{ _MMIO(0x26984), 0x00001022 },
+	{ _MMIO(0x26A04), 0x00000011 },
+	{ _MMIO(0x26A80), 0x00000006 },
+	{ _MMIO(0x26A88), 0x00000C02 },
+	{ _MMIO(0x26A84), 0x00000000 },
+	{ _MMIO(0x26B04), 0x00001000 },
+	{ _MMIO(0x26B80), 0x00000002 },
+	{ _MMIO(0x26B8C), 0x00000007 },
+	{ _MMIO(0x26B84), 0x00000000 },
+	{ _MMIO(0x27804), 0x00004844 },
+	{ _MMIO(0x27884), 0x00000400 },
+	{ _MMIO(0x27900), 0x00000002 },
+	{ _MMIO(0x27908), 0x0E000000 },
+	{ _MMIO(0x27904), 0x00000000 },
+	{ _MMIO(0x27984), 0x00004088 },
+	{ _MMIO(0x27A04), 0x00000044 },
+	{ _MMIO(0x27A80), 0x00000006 },
+	{ _MMIO(0x27A88), 0x00018040 },
+	{ _MMIO(0x27A84), 0x00000000 },
+	{ _MMIO(0x27B04), 0x00004000 },
+	{ _MMIO(0x27B80), 0x00000002 },
+	{ _MMIO(0x27B8C), 0x000000E0 },
+	{ _MMIO(0x27B84), 0x00000000 },
+	{ _MMIO(0x26104), 0x00002222 },
+	{ _MMIO(0x26184), 0x0C006666 },
+	{ _MMIO(0x26284), 0x04000000 },
+	{ _MMIO(0x26304), 0x04000000 },
+	{ _MMIO(0x26400), 0x00000002 },
+	{ _MMIO(0x26410), 0x000000A0 },
+	{ _MMIO(0x26404), 0x00000000 },
+	{ _MMIO(0x25420), 0x04108020 },
+	{ _MMIO(0x25424), 0x1284A420 },
+	{ _MMIO(0x2541C), 0x00000000 },
+	{ _MMIO(0x25428), 0x00042049 },
 };
 
 static int select_render_basic_config(struct drm_i915_private *dev_priv)
 {
-        dev_priv->perf.oa.mux_regs =
-                mux_config_render_basic;
-        dev_priv->perf.oa.mux_regs_len =
-                ARRAY_SIZE(mux_config_render_basic);
+	dev_priv->perf.oa.mux_regs =
+		mux_config_render_basic;
+	dev_priv->perf.oa.mux_regs_len =
+		ARRAY_SIZE(mux_config_render_basic);
 
-        dev_priv->perf.oa.b_counter_regs =
-                b_counter_config_render_basic;
-        dev_priv->perf.oa.b_counter_regs_len =
-                ARRAY_SIZE(b_counter_config_render_basic);
+	dev_priv->perf.oa.b_counter_regs =
+		b_counter_config_render_basic;
+	dev_priv->perf.oa.b_counter_regs_len =
+		ARRAY_SIZE(b_counter_config_render_basic);
 
-        return 0;
+	return 0;
 }
 
 int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv)
 {
-        dev_priv->perf.oa.mux_regs = NULL;
-        dev_priv->perf.oa.mux_regs_len = 0;
-        dev_priv->perf.oa.b_counter_regs = NULL;
-        dev_priv->perf.oa.b_counter_regs_len = 0;
+	dev_priv->perf.oa.mux_regs = NULL;
+	dev_priv->perf.oa.mux_regs_len = 0;
+	dev_priv->perf.oa.b_counter_regs = NULL;
+	dev_priv->perf.oa.b_counter_regs_len = 0;
 
-        switch (dev_priv->perf.oa.metrics_set) {
-        case METRIC_SET_ID_RENDER_BASIC:
-                return select_render_basic_config(dev_priv);
-        default:
-                return -ENODEV;
-        }
+	switch (dev_priv->perf.oa.metrics_set) {
+	case METRIC_SET_ID_RENDER_BASIC:
+		return select_render_basic_config(dev_priv);
+	default:
+		return -ENODEV;
+	}
 }
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 9af9e52..b54f719 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -25,14 +25,692 @@
 #include <linux/sizes.h>
 
 #include "i915_drv.h"
+#include "intel_ringbuffer.h"
+#include "intel_lrc.h"
+#include "i915_oa_hsw.h"
+
+/* Must be a power of two */
+#define OA_BUFFER_SIZE	     SZ_16M
+#define OA_TAKEN(tail, head) ((tail - head) & (OA_BUFFER_SIZE - 1))
+
+/* frequency for forwarding samples from OA to perf buffer */
+#define POLL_FREQUENCY 200
+#define POLL_PERIOD max_t(u64, 10000, NSEC_PER_SEC / POLL_FREQUENCY)
+
+#define OA_EXPONENT_MAX 0x3f
+
+static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
+	[I915_OA_FORMAT_A13]	    = { 0, 64 },
+	[I915_OA_FORMAT_A29]	    = { 1, 128 },
+	[I915_OA_FORMAT_A13_B8_C8]  = { 2, 128 },
+	/* A29_B8_C8 Disallowed as 192 bytes doesn't factor into buffer size */
+	[I915_OA_FORMAT_B4_C8]	    = { 4, 64 },
+	[I915_OA_FORMAT_A45_B8_C8]  = { 5, 256 },
+	[I915_OA_FORMAT_B4_C8_A16]  = { 6, 128 },
+	[I915_OA_FORMAT_C4_B8]	    = { 7, 64 },
+};
+
+#define SAMPLE_OA_REPORT      (1<<0)
 
 struct perf_open_properties {
 	u32 sample_flags;
 
 	u64 single_context:1;
 	u64 ctx_handle;
+
+	/* OA sampling state */
+	int metrics_set;
+	int oa_format;
+	bool oa_periodic;
+	u32 oa_period_exponent;
 };
 
+static bool gen7_oa_buffer_is_empty(struct drm_i915_private *dev_priv)
+{
+	u32 oastatus2 = I915_READ(GEN7_OASTATUS2);
+	u32 oastatus1 = I915_READ(GEN7_OASTATUS1);
+	u32 head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+	u32 tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
+
+	return OA_TAKEN(tail, head) == 0;
+}
+
+/**
+ * Appends a status record to a userspace read() buffer.
+ *
+ * Returns 0 if no room left in userspace read() buffer, 1 if status
+ * record successfully appended or -EFAULT for copy_to_user failure.
+ */
+static int append_oa_status(struct i915_perf_stream *stream,
+			    struct i915_perf_read_state *read_state,
+			    enum drm_i915_perf_record_type type)
+{
+	struct drm_i915_perf_record_header header = { type, 0, sizeof(header) };
+
+	if ((read_state->count - read_state->read) < header.size)
+		return 0;
+
+	if (copy_to_user(read_state->buf, &header, sizeof(header)))
+		return -EFAULT;
+
+	read_state->buf += sizeof(header);
+	read_state->read += header.size;
+
+	return 1;
+}
+
+/**
+ * Copies single OA report into userspace read() buffer.
+ *
+ * Returns 0 if no room left in userspace read() buffer, 1 if sample
+ * record successfully appended or -EFAULT for copy_to_user failure.
+ */
+static int append_oa_sample(struct i915_perf_stream *stream,
+			    struct i915_perf_read_state *read_state,
+			    const u8 *report)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
+	struct drm_i915_perf_record_header header;
+	u32 sample_flags = stream->sample_flags;
+	char __user *buf = read_state->buf;
+
+	header.type = DRM_I915_PERF_RECORD_SAMPLE;
+	header.pad = 0;
+	header.size = stream->sample_size;
+
+	if ((read_state->count - read_state->read) < header.size)
+		return 0;
+
+	if (copy_to_user(buf, &header, sizeof(header)))
+		return -EFAULT;
+	buf += sizeof(header);
+
+	if (sample_flags & SAMPLE_OA_REPORT) {
+		if (copy_to_user(buf, report, report_size))
+			return -EFAULT;
+		buf += report_size;
+	}
+
+	read_state->buf = buf;
+	read_state->read += header.size;
+
+	return 1;
+}
+
+/**
+ * Copies all buffered OA reports into userspace read() buffer.
+ * @head_ptr: (inout): the head pointer before and after appending
+ *
+ * Returns the number of records appended (may be zero if the read()
+ * buffer is already full), or -EFAULT for failures while copying.
+ *
+ * Consistent with i915_perf_stream::read(), this function should only
+ * return an -EFAULT from copying if no records have been copied and
+ * otherwise reports a short read.
+ */
+static int gen7_append_oa_reports(struct i915_perf_stream *stream,
+				  struct i915_perf_read_state *read_state,
+				  u32 *head_ptr,
+				  u32 tail)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	int report_size = dev_priv->perf.oa.oa_buffer.format_size;
+	u8 *oa_buf_base = dev_priv->perf.oa.oa_buffer.addr;
+	u32 mask = (OA_BUFFER_SIZE - 1);
+	u32 head;
+	u8 *report;
+	u32 taken;
+	int ret = 0;
+	int n_records = 0;
+
+	head = *head_ptr - dev_priv->perf.oa.oa_buffer.gtt_offset;
+	tail -= dev_priv->perf.oa.oa_buffer.gtt_offset;
+
+	/* Note: the gpu doesn't wrap the tail according to the OA buffer size
+	 * so when we need to make sure our head/tail values are in-bounds we
+	 * use the above mask.
+	 */
+
+	while ((taken = OA_TAKEN(tail, head))) {
+		/* The tail increases in 64 byte increments, not in
+		 * format_size steps.
+		 */
+		if (taken < report_size)
+			break;
+
+		report = oa_buf_base + (head & mask);
+
+		if (dev_priv->perf.oa.exclusive_stream->enabled) {
+			ret = append_oa_sample(stream, read_state, report);
+			if (n_records && ret == -EFAULT)
+				ret = 0;
+			if (ret > 0)
+				n_records += ret;
+			else
+				break;
+		}
+
+		/* We don't want to update head if there was a problem
+		 * appending the sample.
+		 */
+		head += report_size;
+	}
+
+	*head_ptr = dev_priv->perf.oa.oa_buffer.gtt_offset + head;
+
+	return n_records;
+}
+
+/**
+ * Check OA status registers, appending status records if necessary
+ * and copy as many buffered OA reports to userspace as possible.
+ *
+ * NB: this function should only return an -EFAULT from copying if no
+ * records have been copied and otherwise report a short read.
+ *
+ * Returns the number of records appended (may be zero if no room in
+ * read() buffer), or -EFAULT (the only errno we expect to see here)
+ */
+static int gen7_oa_read(struct i915_perf_stream *stream,
+			struct i915_perf_read_state *read_state)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	u32 oastatus2;
+	u32 oastatus1;
+	u32 head;
+	u32 tail;
+	int ret = 1; /* > 0 implies a prior successful append */
+	int n_records = 0;
+
+	WARN_ON(!dev_priv->perf.oa.oa_buffer.addr);
+
+	oastatus2 = I915_READ(GEN7_OASTATUS2);
+	oastatus1 = I915_READ(GEN7_OASTATUS1);
+
+	head = oastatus2 & GEN7_OASTATUS2_HEAD_MASK;
+	tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
+
+	if (unlikely(oastatus1 & (GEN7_OASTATUS1_OABUFFER_OVERFLOW |
+				  GEN7_OASTATUS1_REPORT_LOST))) {
+
+		if (oastatus1 & GEN7_OASTATUS1_OABUFFER_OVERFLOW) {
+			ret = append_oa_status(stream, read_state,
+					       DRM_I915_PERF_RECORD_OA_BUFFER_OVERFLOW);
+			if (ret <= 0)
+				return ret;
+			n_records += ret;
+			oastatus1 &= ~GEN7_OASTATUS1_OABUFFER_OVERFLOW;
+		}
+
+		if (oastatus1 & GEN7_OASTATUS1_REPORT_LOST) {
+			ret = append_oa_status(stream, read_state,
+					       DRM_I915_PERF_RECORD_OA_REPORT_LOST);
+			if (n_records && ret == -EFAULT)
+				ret = 0;
+			else if (ret < 0)
+				return ret;
+			else if (ret > 0) {
+				n_records += ret;
+				oastatus1 &= ~GEN7_OASTATUS1_REPORT_LOST;
+			}
+		}
+
+		/* XXX: This register contains the OA tail pointer
+		 * which may be updated asynchronously while periodic
+		 * sampling is enabled so it's racy to
+		 * read-modify-write like we are here (esp after the
+		 * append/copy_to_user work above.) */
+#warning "WIP: check the correct way to clear OASTATUS1 bits"
+		I915_WRITE(GEN7_OASTATUS1, oastatus1);
+	}
+
+	/* If there is still buffer space */
+	if (ret) {
+		ret = gen7_append_oa_reports(stream, read_state, &head, tail);
+		if (n_records && ret == -EFAULT)
+			ret = 0;
+		else if (ret < 0)
+			return ret;
+		else if (ret > 0) {
+			n_records += ret;
+
+			I915_WRITE(GEN7_OASTATUS2,
+				   ((head & GEN7_OASTATUS2_HEAD_MASK) |
+				    OA_MEM_SELECT_GGTT));
+		}
+	}
+
+	return n_records;
+}
+
+static bool i915_oa_can_read(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	return !dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv);
+}
+
+static int i915_oa_wait_unlocked(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	/* Note: the oa_buffer_is_empty() condition is ok to run unlocked as it
+	 * just performs mmio reads of the OA buffer head + tail pointers and
+	 * it's assumed we're handling some operation that implies the stream
+	 * can't be destroyed until completion (such as a read()) that ensures
+	 * the device + OA buffer can't disappear
+	 */
+	return wait_event_interruptible(dev_priv->perf.oa.poll_wq,
+					!dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv));
+}
+
+static void i915_oa_poll_wait(struct i915_perf_stream *stream,
+			      struct file *file,
+			      poll_table *wait)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	poll_wait(file, &dev_priv->perf.oa.poll_wq, wait);
+}
+
+static int i915_oa_read(struct i915_perf_stream *stream,
+			struct i915_perf_read_state *read_state)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	return dev_priv->perf.oa.ops.read(stream, read_state);
+}
+
+static void
+free_oa_buffer(struct drm_i915_private *i915)
+{
+	mutex_lock(&i915->dev->struct_mutex);
+
+	vunmap(i915->perf.oa.oa_buffer.addr);
+	i915_gem_object_ggtt_unpin(i915->perf.oa.oa_buffer.obj);
+	drm_gem_object_unreference(&i915->perf.oa.oa_buffer.obj->base);
+
+	i915->perf.oa.oa_buffer.obj = NULL;
+	i915->perf.oa.oa_buffer.gtt_offset = 0;
+	i915->perf.oa.oa_buffer.addr = NULL;
+
+	mutex_unlock(&i915->dev->struct_mutex);
+}
+
+static void i915_oa_stream_destroy(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	BUG_ON(stream != dev_priv->perf.oa.exclusive_stream);
+
+	dev_priv->perf.oa.ops.disable_metric_set(dev_priv);
+
+	free_oa_buffer(dev_priv);
+
+	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+	intel_runtime_pm_put(dev_priv);
+
+	dev_priv->perf.oa.exclusive_stream = NULL;
+}
+
+static void *vmap_oa_buffer(struct drm_i915_gem_object *obj)
+{
+	int i;
+	void *addr = NULL;
+	struct sg_page_iter sg_iter;
+	struct page **pages;
+
+	pages = drm_malloc_ab(obj->base.size >> PAGE_SHIFT, sizeof(*pages));
+	if (pages == NULL) {
+		DRM_DEBUG_DRIVER("Failed to get space for pages\n");
+		goto finish;
+	}
+
+	i = 0;
+	for_each_sg_page(obj->pages->sgl, &sg_iter, obj->pages->nents, 0) {
+		pages[i] = sg_page_iter_page(&sg_iter);
+		i++;
+	}
+
+	addr = vmap(pages, i, 0, PAGE_KERNEL);
+	if (addr == NULL) {
+		DRM_DEBUG_DRIVER("Failed to vmap pages\n");
+		goto finish;
+	}
+
+finish:
+	if (pages)
+		drm_free_large(pages);
+	return addr;
+}
+
+static void gen7_init_oa_buffer(struct drm_i915_private *dev_priv)
+{
+	/* Pre-DevBDW: OABUFFER must be set with counters off,
+	 * before OASTATUS1, but after OASTATUS2
+	 */
+	I915_WRITE(GEN7_OASTATUS2, dev_priv->perf.oa.oa_buffer.gtt_offset |
+		   OA_MEM_SELECT_GGTT); /* head */
+	I915_WRITE(GEN7_OABUFFER, dev_priv->perf.oa.oa_buffer.gtt_offset);
+	I915_WRITE(GEN7_OASTATUS1, dev_priv->perf.oa.oa_buffer.gtt_offset |
+		   OABUFFER_SIZE_16M); /* tail */
+}
+
+static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
+{
+	struct drm_i915_gem_object *bo;
+	int ret;
+
+	BUG_ON(dev_priv->perf.oa.oa_buffer.obj);
+
+	ret = i915_mutex_lock_interruptible(dev_priv->dev);
+	if (ret)
+		return ret;
+
+	bo = i915_gem_alloc_object(dev_priv->dev, OA_BUFFER_SIZE);
+	if (bo == NULL) {
+		DRM_ERROR("Failed to allocate OA buffer\n");
+		ret = -ENOMEM;
+		goto unlock;
+	}
+	dev_priv->perf.oa.oa_buffer.obj = bo;
+
+	ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
+	if (ret)
+		goto err_unref;
+
+	/* PreHSW required 512K alignment, HSW requires 16M */
+	ret = i915_gem_obj_ggtt_pin(bo, SZ_16M, 0);
+	if (ret)
+		goto err_unref;
+
+	dev_priv->perf.oa.oa_buffer.gtt_offset = i915_gem_obj_ggtt_offset(bo);
+	dev_priv->perf.oa.oa_buffer.addr = vmap_oa_buffer(bo);
+
+	dev_priv->perf.oa.ops.init_oa_buffer(dev_priv);
+
+	DRM_DEBUG_DRIVER("OA Buffer initialized, gtt offset = 0x%x, vaddr = %p",
+			 dev_priv->perf.oa.oa_buffer.gtt_offset,
+			 dev_priv->perf.oa.oa_buffer.addr);
+
+	goto unlock;
+
+err_unref:
+	drm_gem_object_unreference(&bo->base);
+
+unlock:
+	mutex_unlock(&dev_priv->dev->struct_mutex);
+	return ret;
+}
+
+static void config_oa_regs(struct drm_i915_private *dev_priv,
+			   const struct i915_oa_reg *regs,
+			   int n_regs)
+{
+	int i;
+
+	for (i = 0; i < n_regs; i++) {
+		const struct i915_oa_reg *reg = regs + i;
+
+		I915_WRITE(reg->addr, reg->value);
+	}
+}
+
+static int hsw_enable_metric_set(struct drm_i915_private *dev_priv)
+{
+	int ret = i915_oa_select_metric_set_hsw(dev_priv);
+
+	if (ret)
+		return ret;
+
+	I915_WRITE(GDT_CHICKEN_BITS, GT_NOA_ENABLE);
+
+	/* PRM:
+	 *
+	 * OA unit is using “crclk” for its functionality. When trunk
+	 * level clock gating takes place, OA clock would be gated,
+	 * unable to count the events from non-render clock domain.
+	 * Render clock gating must be disabled when OA is enabled to
+	 * count the events from non-render domain. Unit level clock
+	 * gating for RCS should also be disabled.
+	 */
+	I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) &
+				    ~GEN7_DOP_CLOCK_GATE_ENABLE));
+	I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) |
+				  GEN6_CSUNIT_CLOCK_GATE_DISABLE));
+
+	config_oa_regs(dev_priv, dev_priv->perf.oa.mux_regs,
+		       dev_priv->perf.oa.mux_regs_len);
+	config_oa_regs(dev_priv, dev_priv->perf.oa.b_counter_regs,
+		       dev_priv->perf.oa.b_counter_regs_len);
+
+	return 0;
+}
+
+static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN6_UCGCTL1, (I915_READ(GEN6_UCGCTL1) &
+				  ~GEN6_CSUNIT_CLOCK_GATE_DISABLE));
+	I915_WRITE(GEN7_MISCCPCTL, (I915_READ(GEN7_MISCCPCTL) |
+				    GEN7_DOP_CLOCK_GATE_ENABLE));
+
+	I915_WRITE(GDT_CHICKEN_BITS, (I915_READ(GDT_CHICKEN_BITS) &
+				      ~GT_NOA_ENABLE));
+}
+
+static void gen7_update_oacontrol_locked(struct drm_i915_private *dev_priv)
+{
+	assert_spin_locked(&dev_priv->perf.hook_lock);
+
+	if (dev_priv->perf.oa.exclusive_stream->enabled) {
+		unsigned long ctx_id = 0;
+		bool pinning_ok = false;
+
+		if (dev_priv->perf.oa.exclusive_stream->ctx &&
+		    dev_priv->perf.oa.specific_ctx_id) {
+			ctx_id = dev_priv->perf.oa.specific_ctx_id;
+			pinning_ok = true;
+		}
+
+		if (dev_priv->perf.oa.exclusive_stream->ctx == NULL ||
+		    pinning_ok) {
+			bool periodic = dev_priv->perf.oa.periodic;
+			u32 period_exponent = dev_priv->perf.oa.period_exponent;
+			u32 report_format = dev_priv->perf.oa.oa_buffer.format;
+
+			I915_WRITE(GEN7_OACONTROL,
+				   (ctx_id & GEN7_OACONTROL_CTX_MASK) |
+				   (period_exponent <<
+				    GEN7_OACONTROL_TIMER_PERIOD_SHIFT) |
+				   (periodic ?
+				    GEN7_OACONTROL_TIMER_ENABLE : 0) |
+				   (report_format <<
+				    GEN7_OACONTROL_FORMAT_SHIFT) |
+				   (ctx_id ?
+				    GEN7_OACONTROL_PER_CTX_ENABLE : 0) |
+				   GEN7_OACONTROL_ENABLE);
+			return;
+		}
+	}
+
+	I915_WRITE(GEN7_OACONTROL, 0);
+}
+
+static void gen7_oa_enable(struct drm_i915_private *dev_priv)
+{
+	unsigned long flags;
+	u32 oastatus1, tail;
+
+	spin_lock_irqsave(&dev_priv->perf.hook_lock, flags);
+	gen7_update_oacontrol_locked(dev_priv);
+	spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags);
+
+	/* Reset the head ptr so we don't forward reports from before now. */
+	oastatus1 = I915_READ(GEN7_OASTATUS1);
+	tail = oastatus1 & GEN7_OASTATUS1_TAIL_MASK;
+	I915_WRITE(GEN7_OASTATUS2, (tail & GEN7_OASTATUS2_HEAD_MASK) |
+				    OA_MEM_SELECT_GGTT);
+}
+
+static void i915_oa_stream_enable(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	dev_priv->perf.oa.ops.oa_enable(dev_priv);
+
+	if (dev_priv->perf.oa.periodic)
+		hrtimer_start(&dev_priv->perf.oa.poll_check_timer,
+			      ns_to_ktime(POLL_PERIOD),
+			      HRTIMER_MODE_REL_PINNED);
+}
+
+static void gen7_oa_disable(struct drm_i915_private *dev_priv)
+{
+	I915_WRITE(GEN7_OACONTROL, 0);
+}
+
+static void i915_oa_stream_disable(struct i915_perf_stream *stream)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+
+	dev_priv->perf.oa.ops.oa_disable(dev_priv);
+
+	if (dev_priv->perf.oa.periodic)
+		hrtimer_cancel(&dev_priv->perf.oa.poll_check_timer);
+}
+
+static int i915_oa_stream_init(struct i915_perf_stream *stream,
+			       struct drm_i915_perf_open_param *param,
+			       struct perf_open_properties *props)
+{
+	struct drm_i915_private *dev_priv = stream->dev_priv;
+	int format_size;
+	int ret;
+
+	if (!(props->sample_flags & SAMPLE_OA_REPORT)) {
+		DRM_ERROR("Only OA report sampling supported\n");
+		return -EINVAL;
+	}
+
+	if (!dev_priv->perf.oa.ops.init_oa_buffer) {
+		DRM_ERROR("OA unit not supported\n");
+		return -ENODEV;
+	}
+
+	/* To avoid the complexity of having to accurately filter
+	 * counter reports and marshal to the appropriate client
+	 * we currently only allow exclusive access
+	 */
+	if (dev_priv->perf.oa.exclusive_stream) {
+		DRM_ERROR("OA unit already in use\n");
+		return -EBUSY;
+	}
+
+	if (!props->metrics_set) {
+		DRM_ERROR("OA metric set not specified\n");
+		return -EINVAL;
+	}
+
+	if (!props->oa_format) {
+		DRM_ERROR("OA report format not specified\n");
+		return -EINVAL;
+	}
+
+	stream->sample_size = sizeof(struct drm_i915_perf_record_header);
+
+	format_size = dev_priv->perf.oa.oa_formats[props->oa_format].size;
+
+	stream->sample_flags |= SAMPLE_OA_REPORT;
+	stream->sample_size += format_size;
+
+	dev_priv->perf.oa.oa_buffer.format_size = format_size;
+	BUG_ON(dev_priv->perf.oa.oa_buffer.format_size == 0);
+
+	dev_priv->perf.oa.oa_buffer.format =
+		dev_priv->perf.oa.oa_formats[props->oa_format].format;
+
+	dev_priv->perf.oa.metrics_set = props->metrics_set;
+
+	dev_priv->perf.oa.periodic = props->oa_periodic;
+	if (dev_priv->perf.oa.periodic)
+		dev_priv->perf.oa.period_exponent = props->oa_period_exponent;
+
+	ret = alloc_oa_buffer(dev_priv);
+	if (ret)
+		return ret;
+
+	dev_priv->perf.oa.exclusive_stream = stream;
+
+	/* PRM - observability performance counters:
+	 *
+	 *   OACONTROL, performance counter enable, note:
+	 *
+	 *   "When this bit is set, in order to have coherent counts,
+	 *   RC6 power state and trunk clock gating must be disabled.
+	 *   This can be achieved by programming MMIO registers as
+	 *   0xA094=0 and 0xA090[31]=1"
+	 *
+	 *   In our case we are expecting that taking pm + FORCEWAKE
+	 *   references will effectively disable RC6.
+	 */
+	intel_runtime_pm_get(dev_priv);
+	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
+
+	dev_priv->perf.oa.ops.enable_metric_set(dev_priv);
+
+	stream->destroy = i915_oa_stream_destroy;
+	stream->enable = i915_oa_stream_enable;
+	stream->disable = i915_oa_stream_disable;
+	stream->can_read = i915_oa_can_read;
+	stream->wait_unlocked = i915_oa_wait_unlocked;
+	stream->poll_wait = i915_oa_poll_wait;
+	stream->read = i915_oa_read;
+
+	return 0;
+}
+
+static void gen7_update_hw_ctx_id_locked(struct drm_i915_private *dev_priv,
+					 u32 ctx_id)
+{
+	assert_spin_locked(&dev_priv->perf.hook_lock);
+
+	dev_priv->perf.oa.specific_ctx_id = ctx_id;
+	gen7_update_oacontrol_locked(dev_priv);
+}
+
+static void i915_oa_context_pin_notify_locked(struct drm_i915_private *dev_priv,
+					      struct intel_context *context)
+{
+	assert_spin_locked(&dev_priv->perf.hook_lock);
+
+	if (i915.enable_execlists ||
+	    dev_priv->perf.oa.ops.update_hw_ctx_id_locked == NULL)
+		return;
+
+	if (dev_priv->perf.oa.exclusive_stream &&
+	    dev_priv->perf.oa.exclusive_stream->ctx == context) {
+		struct drm_i915_gem_object *obj =
+			context->legacy_hw_ctx.rcs_state;
+		u32 ctx_id = i915_gem_obj_ggtt_offset(obj);
+
+		dev_priv->perf.oa.ops.update_hw_ctx_id_locked(dev_priv, ctx_id);
+	}
+}
+
+void i915_oa_context_pin_notify(struct drm_i915_private *dev_priv,
+				struct intel_context *context)
+{
+	unsigned long flags;
+
+	if (!dev_priv->perf.initialized)
+		return;
+
+	spin_lock_irqsave(&dev_priv->perf.hook_lock, flags);
+	i915_oa_context_pin_notify_locked(dev_priv, context);
+	spin_unlock_irqrestore(&dev_priv->perf.hook_lock, flags);
+}
+
 static ssize_t i915_perf_read_locked(struct i915_perf_stream *stream,
 				     struct file *file,
 				     char __user *buf,
@@ -86,6 +764,20 @@ static ssize_t i915_perf_read(struct file *file,
 	return ret;
 }
 
+static enum hrtimer_restart poll_check_timer_cb(struct hrtimer *hrtimer)
+{
+	struct drm_i915_private *dev_priv =
+		container_of(hrtimer, typeof(*dev_priv),
+			     perf.oa.poll_check_timer);
+
+	if (!dev_priv->perf.oa.ops.oa_buffer_is_empty(dev_priv))
+		wake_up(&dev_priv->perf.oa.poll_wq);
+
+	hrtimer_forward_now(hrtimer, ns_to_ktime(POLL_PERIOD));
+
+	return HRTIMER_RESTART;
+}
+
 static unsigned int i915_perf_poll_locked(struct i915_perf_stream *stream,
 					  struct file *file,
 					  poll_table *wait)
@@ -276,18 +968,18 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev,
 		goto err_ctx;
 	}
 
-	stream->sample_flags = props->sample_flags;
 	stream->dev_priv = dev_priv;
 	stream->ctx = specific_ctx;
 
-	/*
-	 * TODO: support sampling something
-	 *
-	 * For now this is as far as we can go.
+	ret = i915_oa_stream_init(stream, param, props);
+	if (ret)
+		goto err_alloc;
+
+	/* we avoid simply assigning stream->sample_flags = props->sample_flags
+	 * to have _stream_init check the combination of sample flags more
+	 * thoroughly, but still this is the expected result at this point.
 	 */
-	DRM_ERROR("Unsupported i915 perf stream configuration\n");
-	ret = -EINVAL;
-	goto err_alloc;
+	BUG_ON(stream->sample_flags != props->sample_flags);
 
 	list_add(&stream->link, &dev_priv->perf.streams);
 
@@ -372,7 +1064,53 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			props->single_context = 1;
 			props->ctx_handle = value;
 			break;
-
+		case DRM_I915_PERF_SAMPLE_OA_PROP:
+			props->sample_flags |= SAMPLE_OA_REPORT;
+			break;
+		case DRM_I915_PERF_OA_METRICS_SET_PROP:
+			if (value == 0 ||
+			    value > dev_priv->perf.oa.n_builtin_sets) {
+				DRM_ERROR("Unknown OA metric set ID");
+				return -EINVAL;
+			}
+			props->metrics_set = value;
+			break;
+		case DRM_I915_PERF_OA_FORMAT_PROP:
+			if (value == 0 || value >= I915_OA_FORMAT_MAX) {
+				DRM_ERROR("Invalid OA report format\n");
+				return -EINVAL;
+			}
+			if (!dev_priv->perf.oa.oa_formats[value].size) {
+				DRM_ERROR("Invalid OA report format\n");
+				return -EINVAL;
+			}
+			props->oa_format = value;
+			break;
+		case DRM_I915_PERF_OA_EXPONENT_PROP:
+			if (value > OA_EXPONENT_MAX)
+				return -EINVAL;
+
+			/* NB: The exponent represents a period as follows:
+			 *
+			 *   80ns * 2^(period_exponent + 1)
+			 *
+			 * Theoretically we can program the OA unit to sample
+			 * every 160ns but don't allow that by default unless
+			 * root.
+			 *
+			 * Referring to perf's
+			 * kernel.perf_event_max_sample_rate for a precedent
+			 * (100000 by default); with an OA exponent of 6 we get
+			 * a period of 10.240 microseconds -just under 100000Hz
+			 */
+			if (value < 6 && !capable(CAP_SYS_ADMIN)) {
+				DRM_ERROR("Sampling period too high without root privileges\n");
+				return -EACCES;
+			}
+
+			props->oa_periodic = true;
+			props->oa_period_exponent = value;
+			break;
 		case DRM_I915_PERF_PROP_MAX:
 			BUG();
 		}
@@ -423,8 +1161,33 @@ void i915_perf_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
 
+	if (!IS_HASWELL(dev))
+		return;
+
+	hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
+		     CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+	dev_priv->perf.oa.poll_check_timer.function = poll_check_timer_cb;
+	init_waitqueue_head(&dev_priv->perf.oa.poll_wq);
+
 	INIT_LIST_HEAD(&dev_priv->perf.streams);
 	mutex_init(&dev_priv->perf.lock);
+	spin_lock_init(&dev_priv->perf.hook_lock);
+
+	dev_priv->perf.oa.ops.init_oa_buffer = gen7_init_oa_buffer;
+	dev_priv->perf.oa.ops.enable_metric_set = hsw_enable_metric_set;
+	dev_priv->perf.oa.ops.disable_metric_set = hsw_disable_metric_set;
+	dev_priv->perf.oa.ops.oa_enable = gen7_oa_enable;
+	dev_priv->perf.oa.ops.oa_disable = gen7_oa_disable;
+	dev_priv->perf.oa.ops.update_hw_ctx_id_locked = gen7_update_hw_ctx_id_locked;
+	dev_priv->perf.oa.ops.read = gen7_oa_read;
+	dev_priv->perf.oa.ops.oa_buffer_is_empty = gen7_oa_buffer_is_empty;
+
+	dev_priv->perf.oa.oa_formats = hsw_oa_formats;
+
+	dev_priv->perf.oa.n_builtin_sets =
+		i915_oa_n_builtin_metric_sets_hsw;
+
+	dev_priv->perf.oa.oa_formats = hsw_oa_formats;
 
 	dev_priv->perf.initialized = true;
 }
@@ -436,7 +1199,7 @@ void i915_perf_fini(struct drm_device *dev)
 	if (!dev_priv->perf.initialized)
 		return;
 
-	/* Currently nothing to clean up */
+	dev_priv->perf.oa.ops.init_oa_buffer = NULL;
 
 	dev_priv->perf.initialized = false;
 }
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 119401c..5ab95e0 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -587,6 +587,343 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
 #define GEN7_GPGPU_DISPATCHDIMZ         _MMIO(0x2508)
 
 #define GEN7_OACONTROL _MMIO(0x2360)
+#define  GEN7_OACONTROL_CTX_MASK	    0xFFFFF000
+#define  GEN7_OACONTROL_TIMER_PERIOD_MASK   0x3F
+#define  GEN7_OACONTROL_TIMER_PERIOD_SHIFT  6
+#define  GEN7_OACONTROL_TIMER_ENABLE	    (1<<5)
+#define  GEN7_OACONTROL_FORMAT_A13	    (0<<2)
+#define  GEN7_OACONTROL_FORMAT_A29	    (1<<2)
+#define  GEN7_OACONTROL_FORMAT_A13_B8_C8    (2<<2)
+#define  GEN7_OACONTROL_FORMAT_A29_B8_C8    (3<<2)
+#define  GEN7_OACONTROL_FORMAT_B4_C8	    (4<<2)
+#define  GEN7_OACONTROL_FORMAT_A45_B8_C8    (5<<2)
+#define  GEN7_OACONTROL_FORMAT_B4_C8_A16    (6<<2)
+#define  GEN7_OACONTROL_FORMAT_C4_B8	    (7<<2)
+#define  GEN7_OACONTROL_FORMAT_SHIFT	    2
+#define  GEN7_OACONTROL_PER_CTX_ENABLE	    (1<<1)
+#define  GEN7_OACONTROL_ENABLE		    (1<<0)
+
+#define GEN8_OACTXID _MMIO(0x2364)
+
+#define GEN8_OACONTROL _MMIO(0x2B00)
+#define  GEN8_OA_REPORT_FORMAT_A12	    (0<<2)
+#define  GEN8_OA_REPORT_FORMAT_A12_B8_C8    (2<<2)
+#define  GEN8_OA_REPORT_FORMAT_A36_B8_C8    (5<<2)
+#define  GEN8_OA_REPORT_FORMAT_C4_B8	    (7<<2)
+#define  GEN8_OA_REPORT_FORMAT_SHIFT	    2
+#define  GEN8_OA_SPECIFIC_CONTEXT_ENABLE    (1<<1)
+#define  GEN8_OA_COUNTER_ENABLE             (1<<0)
+
+#define GEN8_OACTXCONTROL _MMIO(0x2360)
+#define  GEN8_OA_TIMER_PERIOD_MASK	    0x3F
+#define  GEN8_OA_TIMER_PERIOD_SHIFT	    2
+#define  GEN8_OA_TIMER_ENABLE		    (1<<1)
+#define  GEN8_OA_COUNTER_RESUME		    (1<<0)
+
+#define GEN7_OABUFFER _MMIO(0x23B0) /* R/W */
+#define  GEN7_OABUFFER_OVERRUN_DISABLE	    (1<<3)
+#define  GEN7_OABUFFER_EDGE_TRIGGER	    (1<<2)
+#define  GEN7_OABUFFER_STOP_RESUME_ENABLE   (1<<1)
+#define  GEN7_OABUFFER_RESUME		    (1<<0)
+
+#define GEN8_OABUFFER _MMIO(0x2b14)
+
+#define GEN7_OASTATUS1 _MMIO(0x2364)
+#define  GEN7_OASTATUS1_TAIL_MASK	    0xffffffc0
+#define  GEN7_OASTATUS1_COUNTER_OVERFLOW    (1<<2)
+#define  GEN7_OASTATUS1_OABUFFER_OVERFLOW   (1<<1)
+#define  GEN7_OASTATUS1_REPORT_LOST	    (1<<0)
+
+#define GEN7_OASTATUS2 _MMIO(0x2368)
+#define GEN7_OASTATUS2_HEAD_MASK    0xffffffc0
+
+#define GEN8_OASTATUS _MMIO(0x2b08)
+#define  GEN8_OASTATUS_OVERRUN_STATUS	    (1<<3)
+#define  GEN8_OASTATUS_COUNTER_OVERFLOW     (1<<2)
+#define  GEN8_OASTATUS_OABUFFER_OVERFLOW    (1<<1)
+#define  GEN8_OASTATUS_REPORT_LOST	    (1<<0)
+
+#define GEN8_OAHEADPTR _MMIO(0x2B0C)
+#define GEN8_OATAILPTR _MMIO(0x2B10)
+
+#define OABUFFER_SIZE_128K  (0<<3)
+#define OABUFFER_SIZE_256K  (1<<3)
+#define OABUFFER_SIZE_512K  (2<<3)
+#define OABUFFER_SIZE_1M    (3<<3)
+#define OABUFFER_SIZE_2M    (4<<3)
+#define OABUFFER_SIZE_4M    (5<<3)
+#define OABUFFER_SIZE_8M    (6<<3)
+#define OABUFFER_SIZE_16M   (7<<3)
+
+#define OA_MEM_SELECT_GGTT  (1<<0)
+
+#define EU_PERF_CNTL0	    _MMIO(0xe458)
+
+#define GDT_CHICKEN_BITS    _MMIO(0x9840)
+#define GT_NOA_ENABLE	    0x00000080
+
+/*
+ * OA Boolean state
+ */
+
+#define OAREPORTTRIG1 _MMIO(0x2740)
+#define OAREPORTTRIG1_THRESHOLD_MASK 0xffff
+#define OAREPORTTRIG1_EDGE_LEVEL_TRIGER_SELECT_MASK 0xffff0000 /* 0=level */
+
+#define OAREPORTTRIG2 _MMIO(0x2744)
+#define OAREPORTTRIG2_INVERT_A_0  (1<<0)
+#define OAREPORTTRIG2_INVERT_A_1  (1<<1)
+#define OAREPORTTRIG2_INVERT_A_2  (1<<2)
+#define OAREPORTTRIG2_INVERT_A_3  (1<<3)
+#define OAREPORTTRIG2_INVERT_A_4  (1<<4)
+#define OAREPORTTRIG2_INVERT_A_5  (1<<5)
+#define OAREPORTTRIG2_INVERT_A_6  (1<<6)
+#define OAREPORTTRIG2_INVERT_A_7  (1<<7)
+#define OAREPORTTRIG2_INVERT_A_8  (1<<8)
+#define OAREPORTTRIG2_INVERT_A_9  (1<<9)
+#define OAREPORTTRIG2_INVERT_A_10 (1<<10)
+#define OAREPORTTRIG2_INVERT_A_11 (1<<11)
+#define OAREPORTTRIG2_INVERT_A_12 (1<<12)
+#define OAREPORTTRIG2_INVERT_A_13 (1<<13)
+#define OAREPORTTRIG2_INVERT_A_14 (1<<14)
+#define OAREPORTTRIG2_INVERT_A_15 (1<<15)
+#define OAREPORTTRIG2_INVERT_B_0  (1<<16)
+#define OAREPORTTRIG2_INVERT_B_1  (1<<17)
+#define OAREPORTTRIG2_INVERT_B_2  (1<<18)
+#define OAREPORTTRIG2_INVERT_B_3  (1<<19)
+#define OAREPORTTRIG2_INVERT_C_0  (1<<20)
+#define OAREPORTTRIG2_INVERT_C_1  (1<<21)
+#define OAREPORTTRIG2_INVERT_D_0  (1<<22)
+#define OAREPORTTRIG2_THRESHOLD_ENABLE	    (1<<23)
+#define OAREPORTTRIG2_REPORT_TRIGGER_ENABLE (1<<31)
+
+#define OAREPORTTRIG3 _MMIO(0x2748)
+#define OAREPORTTRIG3_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG3_NOA_SELECT_8_SHIFT    0
+#define OAREPORTTRIG3_NOA_SELECT_9_SHIFT    4
+#define OAREPORTTRIG3_NOA_SELECT_10_SHIFT   8
+#define OAREPORTTRIG3_NOA_SELECT_11_SHIFT   12
+#define OAREPORTTRIG3_NOA_SELECT_12_SHIFT   16
+#define OAREPORTTRIG3_NOA_SELECT_13_SHIFT   20
+#define OAREPORTTRIG3_NOA_SELECT_14_SHIFT   24
+#define OAREPORTTRIG3_NOA_SELECT_15_SHIFT   28
+
+#define OAREPORTTRIG4 _MMIO(0x274c)
+#define OAREPORTTRIG4_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG4_NOA_SELECT_0_SHIFT    0
+#define OAREPORTTRIG4_NOA_SELECT_1_SHIFT    4
+#define OAREPORTTRIG4_NOA_SELECT_2_SHIFT    8
+#define OAREPORTTRIG4_NOA_SELECT_3_SHIFT    12
+#define OAREPORTTRIG4_NOA_SELECT_4_SHIFT    16
+#define OAREPORTTRIG4_NOA_SELECT_5_SHIFT    20
+#define OAREPORTTRIG4_NOA_SELECT_6_SHIFT    24
+#define OAREPORTTRIG4_NOA_SELECT_7_SHIFT    28
+
+#define OAREPORTTRIG5 _MMIO(0x2750)
+#define OAREPORTTRIG5_THRESHOLD_MASK 0xffff
+#define OAREPORTTRIG5_EDGE_LEVEL_TRIGER_SELECT_MASK 0xffff0000 /* 0=level */
+
+#define OAREPORTTRIG6 _MMIO(0x2754)
+#define OAREPORTTRIG6_INVERT_A_0  (1<<0)
+#define OAREPORTTRIG6_INVERT_A_1  (1<<1)
+#define OAREPORTTRIG6_INVERT_A_2  (1<<2)
+#define OAREPORTTRIG6_INVERT_A_3  (1<<3)
+#define OAREPORTTRIG6_INVERT_A_4  (1<<4)
+#define OAREPORTTRIG6_INVERT_A_5  (1<<5)
+#define OAREPORTTRIG6_INVERT_A_6  (1<<6)
+#define OAREPORTTRIG6_INVERT_A_7  (1<<7)
+#define OAREPORTTRIG6_INVERT_A_8  (1<<8)
+#define OAREPORTTRIG6_INVERT_A_9  (1<<9)
+#define OAREPORTTRIG6_INVERT_A_10 (1<<10)
+#define OAREPORTTRIG6_INVERT_A_11 (1<<11)
+#define OAREPORTTRIG6_INVERT_A_12 (1<<12)
+#define OAREPORTTRIG6_INVERT_A_13 (1<<13)
+#define OAREPORTTRIG6_INVERT_A_14 (1<<14)
+#define OAREPORTTRIG6_INVERT_A_15 (1<<15)
+#define OAREPORTTRIG6_INVERT_B_0  (1<<16)
+#define OAREPORTTRIG6_INVERT_B_1  (1<<17)
+#define OAREPORTTRIG6_INVERT_B_2  (1<<18)
+#define OAREPORTTRIG6_INVERT_B_3  (1<<19)
+#define OAREPORTTRIG6_INVERT_C_0  (1<<20)
+#define OAREPORTTRIG6_INVERT_C_1  (1<<21)
+#define OAREPORTTRIG6_INVERT_D_0  (1<<22)
+#define OAREPORTTRIG6_THRESHOLD_ENABLE	    (1<<23)
+#define OAREPORTTRIG6_REPORT_TRIGGER_ENABLE (1<<31)
+
+#define OAREPORTTRIG7 _MMIO(0x2758)
+#define OAREPORTTRIG7_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG7_NOA_SELECT_8_SHIFT    0
+#define OAREPORTTRIG7_NOA_SELECT_9_SHIFT    4
+#define OAREPORTTRIG7_NOA_SELECT_10_SHIFT   8
+#define OAREPORTTRIG7_NOA_SELECT_11_SHIFT   12
+#define OAREPORTTRIG7_NOA_SELECT_12_SHIFT   16
+#define OAREPORTTRIG7_NOA_SELECT_13_SHIFT   20
+#define OAREPORTTRIG7_NOA_SELECT_14_SHIFT   24
+#define OAREPORTTRIG7_NOA_SELECT_15_SHIFT   28
+
+#define OAREPORTTRIG8 _MMIO(0x275c)
+#define OAREPORTTRIG8_NOA_SELECT_MASK	    0xf
+#define OAREPORTTRIG8_NOA_SELECT_0_SHIFT    0
+#define OAREPORTTRIG8_NOA_SELECT_1_SHIFT    4
+#define OAREPORTTRIG8_NOA_SELECT_2_SHIFT    8
+#define OAREPORTTRIG8_NOA_SELECT_3_SHIFT    12
+#define OAREPORTTRIG8_NOA_SELECT_4_SHIFT    16
+#define OAREPORTTRIG8_NOA_SELECT_5_SHIFT    20
+#define OAREPORTTRIG8_NOA_SELECT_6_SHIFT    24
+#define OAREPORTTRIG8_NOA_SELECT_7_SHIFT    28
+
+#define OASTARTTRIG1 _MMIO(0x2710)
+#define OASTARTTRIG1_THRESHOLD_COUNT_MASK_MBZ 0xffff0000
+#define OASTARTTRIG1_THRESHOLD_MASK	      0xffff
+
+#define OASTARTTRIG2 _MMIO(0x2714)
+#define OASTARTTRIG2_INVERT_A_0 (1<<0)
+#define OASTARTTRIG2_INVERT_A_1 (1<<1)
+#define OASTARTTRIG2_INVERT_A_2 (1<<2)
+#define OASTARTTRIG2_INVERT_A_3 (1<<3)
+#define OASTARTTRIG2_INVERT_A_4 (1<<4)
+#define OASTARTTRIG2_INVERT_A_5 (1<<5)
+#define OASTARTTRIG2_INVERT_A_6 (1<<6)
+#define OASTARTTRIG2_INVERT_A_7 (1<<7)
+#define OASTARTTRIG2_INVERT_A_8 (1<<8)
+#define OASTARTTRIG2_INVERT_A_9 (1<<9)
+#define OASTARTTRIG2_INVERT_A_10 (1<<10)
+#define OASTARTTRIG2_INVERT_A_11 (1<<11)
+#define OASTARTTRIG2_INVERT_A_12 (1<<12)
+#define OASTARTTRIG2_INVERT_A_13 (1<<13)
+#define OASTARTTRIG2_INVERT_A_14 (1<<14)
+#define OASTARTTRIG2_INVERT_A_15 (1<<15)
+#define OASTARTTRIG2_INVERT_B_0 (1<<16)
+#define OASTARTTRIG2_INVERT_B_1 (1<<17)
+#define OASTARTTRIG2_INVERT_B_2 (1<<18)
+#define OASTARTTRIG2_INVERT_B_3 (1<<19)
+#define OASTARTTRIG2_INVERT_C_0 (1<<20)
+#define OASTARTTRIG2_INVERT_C_1 (1<<21)
+#define OASTARTTRIG2_INVERT_D_0 (1<<22)
+#define OASTARTTRIG2_THRESHOLD_ENABLE	    (1<<23)
+#define OASTARTTRIG2_START_TRIG_FLAG_MBZ    (1<<24)
+#define OASTARTTRIG2_EVENT_SELECT_0  (1<<28)
+#define OASTARTTRIG2_EVENT_SELECT_1  (1<<29)
+#define OASTARTTRIG2_EVENT_SELECT_2  (1<<30)
+#define OASTARTTRIG2_EVENT_SELECT_3  (1<<31)
+
+#define OASTARTTRIG3 _MMIO(0x2718)
+#define OASTARTTRIG3_NOA_SELECT_MASK	   0xf
+#define OASTARTTRIG3_NOA_SELECT_8_SHIFT    0
+#define OASTARTTRIG3_NOA_SELECT_9_SHIFT    4
+#define OASTARTTRIG3_NOA_SELECT_10_SHIFT   8
+#define OASTARTTRIG3_NOA_SELECT_11_SHIFT   12
+#define OASTARTTRIG3_NOA_SELECT_12_SHIFT   16
+#define OASTARTTRIG3_NOA_SELECT_13_SHIFT   20
+#define OASTARTTRIG3_NOA_SELECT_14_SHIFT   24
+#define OASTARTTRIG3_NOA_SELECT_15_SHIFT   28
+
+#define OASTARTTRIG4 _MMIO(0x271c)
+#define OASTARTTRIG4_NOA_SELECT_MASK	    0xf
+#define OASTARTTRIG4_NOA_SELECT_0_SHIFT    0
+#define OASTARTTRIG4_NOA_SELECT_1_SHIFT    4
+#define OASTARTTRIG4_NOA_SELECT_2_SHIFT    8
+#define OASTARTTRIG4_NOA_SELECT_3_SHIFT    12
+#define OASTARTTRIG4_NOA_SELECT_4_SHIFT    16
+#define OASTARTTRIG4_NOA_SELECT_5_SHIFT    20
+#define OASTARTTRIG4_NOA_SELECT_6_SHIFT    24
+#define OASTARTTRIG4_NOA_SELECT_7_SHIFT    28
+
+#define OASTARTTRIG5 _MMIO(0x2720)
+#define OASTARTTRIG5_THRESHOLD_COUNT_MASK_MBZ 0xffff0000
+#define OASTARTTRIG5_THRESHOLD_MASK	      0xffff
+
+#define OASTARTTRIG6 _MMIO(0x2724)
+#define OASTARTTRIG6_INVERT_A_0 (1<<0)
+#define OASTARTTRIG6_INVERT_A_1 (1<<1)
+#define OASTARTTRIG6_INVERT_A_2 (1<<2)
+#define OASTARTTRIG6_INVERT_A_3 (1<<3)
+#define OASTARTTRIG6_INVERT_A_4 (1<<4)
+#define OASTARTTRIG6_INVERT_A_5 (1<<5)
+#define OASTARTTRIG6_INVERT_A_6 (1<<6)
+#define OASTARTTRIG6_INVERT_A_7 (1<<7)
+#define OASTARTTRIG6_INVERT_A_8 (1<<8)
+#define OASTARTTRIG6_INVERT_A_9 (1<<9)
+#define OASTARTTRIG6_INVERT_A_10 (1<<10)
+#define OASTARTTRIG6_INVERT_A_11 (1<<11)
+#define OASTARTTRIG6_INVERT_A_12 (1<<12)
+#define OASTARTTRIG6_INVERT_A_13 (1<<13)
+#define OASTARTTRIG6_INVERT_A_14 (1<<14)
+#define OASTARTTRIG6_INVERT_A_15 (1<<15)
+#define OASTARTTRIG6_INVERT_B_0 (1<<16)
+#define OASTARTTRIG6_INVERT_B_1 (1<<17)
+#define OASTARTTRIG6_INVERT_B_2 (1<<18)
+#define OASTARTTRIG6_INVERT_B_3 (1<<19)
+#define OASTARTTRIG6_INVERT_C_0 (1<<20)
+#define OASTARTTRIG6_INVERT_C_1 (1<<21)
+#define OASTARTTRIG6_INVERT_D_0 (1<<22)
+#define OASTARTTRIG6_THRESHOLD_ENABLE	    (1<<23)
+#define OASTARTTRIG6_START_TRIG_FLAG_MBZ    (1<<24)
+#define OASTARTTRIG6_EVENT_SELECT_4  (1<<28)
+#define OASTARTTRIG6_EVENT_SELECT_5  (1<<29)
+#define OASTARTTRIG6_EVENT_SELECT_6  (1<<30)
+#define OASTARTTRIG6_EVENT_SELECT_7  (1<<31)
+
+#define OASTARTTRIG7 _MMIO(0x2728)
+#define OASTARTTRIG7_NOA_SELECT_MASK	   0xf
+#define OASTARTTRIG7_NOA_SELECT_8_SHIFT    0
+#define OASTARTTRIG7_NOA_SELECT_9_SHIFT    4
+#define OASTARTTRIG7_NOA_SELECT_10_SHIFT   8
+#define OASTARTTRIG7_NOA_SELECT_11_SHIFT   12
+#define OASTARTTRIG7_NOA_SELECT_12_SHIFT   16
+#define OASTARTTRIG7_NOA_SELECT_13_SHIFT   20
+#define OASTARTTRIG7_NOA_SELECT_14_SHIFT   24
+#define OASTARTTRIG7_NOA_SELECT_15_SHIFT   28
+
+#define OASTARTTRIG8 _MMIO(0x272c)
+#define OASTARTTRIG8_NOA_SELECT_MASK	   0xf
+#define OASTARTTRIG8_NOA_SELECT_0_SHIFT    0
+#define OASTARTTRIG8_NOA_SELECT_1_SHIFT    4
+#define OASTARTTRIG8_NOA_SELECT_2_SHIFT    8
+#define OASTARTTRIG8_NOA_SELECT_3_SHIFT    12
+#define OASTARTTRIG8_NOA_SELECT_4_SHIFT    16
+#define OASTARTTRIG8_NOA_SELECT_5_SHIFT    20
+#define OASTARTTRIG8_NOA_SELECT_6_SHIFT    24
+#define OASTARTTRIG8_NOA_SELECT_7_SHIFT    28
+
+/* CECX_0 */
+#define OACEC_COMPARE_LESS_OR_EQUAL	6
+#define OACEC_COMPARE_NOT_EQUAL		5
+#define OACEC_COMPARE_LESS_THAN		4
+#define OACEC_COMPARE_GREATER_OR_EQUAL	3
+#define OACEC_COMPARE_EQUAL		2
+#define OACEC_COMPARE_GREATER_THAN	1
+#define OACEC_COMPARE_ANY_EQUAL		0
+
+#define OACEC_COMPARE_VALUE_MASK    0xffff
+#define OACEC_COMPARE_VALUE_SHIFT   3
+
+#define OACEC_SELECT_NOA	(0<<19)
+#define OACEC_SELECT_PREV	(1<<19)
+#define OACEC_SELECT_BOOLEAN	(2<<19)
+
+/* CECX_1 */
+#define OACEC_MASK_MASK		    0xffff
+#define OACEC_CONSIDERATIONS_MASK   0xffff
+#define OACEC_CONSIDERATIONS_SHIFT  16
+
+#define OACEC0_0 _MMIO(0x2770)
+#define OACEC0_1 _MMIO(0x2774)
+#define OACEC1_0 _MMIO(0x2778)
+#define OACEC1_1 _MMIO(0x277c)
+#define OACEC2_0 _MMIO(0x2780)
+#define OACEC2_1 _MMIO(0x2784)
+#define OACEC3_0 _MMIO(0x2788)
+#define OACEC3_1 _MMIO(0x278c)
+#define OACEC4_0 _MMIO(0x2790)
+#define OACEC4_1 _MMIO(0x2794)
+#define OACEC5_0 _MMIO(0x2798)
+#define OACEC5_1 _MMIO(0x279c)
+#define OACEC6_0 _MMIO(0x27a0)
+#define OACEC6_1 _MMIO(0x27a4)
+#define OACEC7_0 _MMIO(0x27a8)
+#define OACEC7_1 _MMIO(0x27ac)
+
 
 #define _GEN7_PIPEA_DE_LOAD_SL	0x70068
 #define _GEN7_PIPEB_DE_LOAD_SL	0x71068
@@ -6849,6 +7186,7 @@ enum skl_disp_power_wells {
 # define GEN6_RCCUNIT_CLOCK_GATE_DISABLE		(1 << 11)
 
 #define GEN6_UCGCTL3				_MMIO(0x9408)
+# define GEN6_OACSUNIT_CLOCK_GATE_DISABLE		(1 << 20)
 
 #define GEN7_UCGCTL4				_MMIO(0x940c)
 #define  GEN7_L3BANK2X_CLOCK_GATE_DISABLE	(1<<25)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 68ca26e..c8fc556 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1172,6 +1172,18 @@ struct drm_i915_gem_context_param {
 	__u64 value;
 };
 
+enum drm_i915_oa_format {
+	I915_OA_FORMAT_A13 = 1,
+	I915_OA_FORMAT_A29,
+	I915_OA_FORMAT_A13_B8_C8,
+	I915_OA_FORMAT_B4_C8,
+	I915_OA_FORMAT_A45_B8_C8,
+	I915_OA_FORMAT_B4_C8_A16,
+	I915_OA_FORMAT_C4_B8,
+
+	I915_OA_FORMAT_MAX	    /* non-ABI */
+};
+
 #define I915_PERF_FLAG_FD_CLOEXEC	(1<<0)
 #define I915_PERF_FLAG_FD_NONBLOCK	(1<<1)
 #define I915_PERF_FLAG_DISABLED		(1<<2)
@@ -1184,6 +1196,32 @@ enum drm_i915_perf_property_id {
 	 */
 	DRM_I915_PERF_CTX_HANDLE_PROP = 1,
 
+	/**
+	 * A value of 1 requests the inclusion of raw OA unit reports as
+	 * part of stream samples.
+	 */
+	DRM_I915_PERF_SAMPLE_OA_PROP,
+
+	/**
+	 * The value specifies which set of OA unit metrics should be
+	 * be configured, defining the contents of any OA unit reports.
+	 */
+	DRM_I915_PERF_OA_METRICS_SET_PROP,
+
+	/**
+	 * The value specifies the size and layout of OA unit reports.
+	 */
+	DRM_I915_PERF_OA_FORMAT_PROP,
+
+	/**
+	 * Specifying this property implicitly requests periodic OA unit
+	 * sampling and (at least on Haswell) the sampling frequency is derived
+	 * from this exponent as follows:
+	 *
+	 *   80ns * 2^(period_exponent + 1)
+	 */
+	DRM_I915_PERF_OA_EXPONENT_PROP,
+
 	DRM_I915_PERF_PROP_MAX /* non-ABI */
 };
 
@@ -1225,17 +1263,31 @@ enum drm_i915_perf_record_type {
 	 * every sample.
 	 *
 	 * The order of these sample properties given by userspace has no
-	 * affect on the ordering of data within a sample. The order will be
+	 * affect on the ordering of data within a sample. The order is
 	 * documented here.
 	 *
 	 * struct {
 	 *     struct drm_i915_perf_record_header header;
 	 *
-	 *     TODO: itemize extensible sample data here
+	 *     { u32 oa_report[]; } && DRM_I915_PERF_SAMPLE_OA_PROP
 	 * };
 	 */
 	DRM_I915_PERF_RECORD_SAMPLE = 1,
 
+	/*
+	 * Indicates that one or more OA reports was not written
+	 * by the hardware.
+	 */
+	DRM_I915_PERF_RECORD_OA_REPORT_LOST = 2,
+
+	/*
+	 * Indicates that the internal circular buffer that Gen
+	 * graphics writes OA reports into has filled, which may
+	 * either mean that old reports could be overwritten or
+	 * subsequent reports lost until the buffer is cleared.
+	 */
+	DRM_I915_PERF_RECORD_OA_BUFFER_OVERFLOW = 3,
+
 	DRM_I915_PERF_RECORD_MAX /* non-ABI */
 };
 
-- 
2.7.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/8] drm/i915: advertise available metrics via sysfs
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
                   ` (3 preceding siblings ...)
  2016-02-03 18:39 ` [PATCH 4/8] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-03 18:39 ` [PATCH 6/8] drm/i915: Add dev.i915.perf_event_paranoid sysctl option Robert Bragg
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: David Airlie, dri-devel, Sourab Gupta, Daniel Vetter

Each metric set is given a sysfs entry like:

/sys/class/drm/card0/metrics/<guid>/id

This allows userspace to enumerate the specific sets that are available
for the current system. The 'id' file contains an unsigned integer that
can be used to open the associated metric set via
DRM_IOCTL_I915_PERF_OPEN. The <guid> is a globally unique ID for a
specific OA unit configuration that can be reliably used as a key to
lookup corresponding counter meta data and normalization equations.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h    |  2 ++
 drivers/gpu/drm/i915/i915_oa_hsw.c | 45 ++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_oa_hsw.h |  4 ++++
 drivers/gpu/drm/i915/i915_perf.c   | 18 ++++++++++++++-
 4 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e4b9399..b1c60bb 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2086,6 +2086,8 @@ struct drm_i915_private {
 	struct {
 		bool initialized;
 
+		struct kobject *metrics_kobj;
+
 		struct mutex lock;
 		struct list_head streams;
 
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c
index 5472aa0..ecd7d2b 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -24,6 +24,8 @@
  *
  */
 
+#include <linux/sysfs.h>
+
 #include "i915_drv.h"
 
 enum metric_set_id {
@@ -130,3 +132,46 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv)
 		return -ENODEV;
 	}
 }
+
+static ssize_t
+show_render_basic_id(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+        return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+}
+
+static struct device_attribute dev_attr_render_basic_id = {
+        .attr = { .name = "id", .mode = S_IRUGO },
+        .show = show_render_basic_id,
+        .store = NULL,
+};
+
+static struct attribute *attrs_render_basic[] = {
+        &dev_attr_render_basic_id.attr,
+        NULL,
+};
+
+static struct attribute_group group_render_basic = {
+        .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+        .attrs =  attrs_render_basic,
+};
+
+int
+i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+        int ret;
+
+        ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic);
+        if (ret)
+                goto error_render_basic;
+
+        return 0;
+
+error_render_basic:
+        return ret;
+}
+
+void
+i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv)
+{
+        sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic);
+}
diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.h b/drivers/gpu/drm/i915/i915_oa_hsw.h
index b618a1f..e4ba89d 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.h
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.h
@@ -31,4 +31,8 @@ extern int i915_oa_n_builtin_metric_sets_hsw;
 
 extern int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv);
 
+extern int i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv);
+
+extern void i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv);
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index b54f719..5c08a16 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1164,6 +1164,11 @@ void i915_perf_init(struct drm_device *dev)
 	if (!IS_HASWELL(dev))
 		return;
 
+	dev_priv->perf.metrics_kobj =
+		kobject_create_and_add("metrics", &dev->primary->kdev->kobj);
+	if (!dev_priv->perf.metrics_kobj)
+		return;
+
 	hrtimer_init(&dev_priv->perf.oa.poll_check_timer,
 		     CLOCK_MONOTONIC, HRTIMER_MODE_REL);
 	dev_priv->perf.oa.poll_check_timer.function = poll_check_timer_cb;
@@ -1187,9 +1192,15 @@ void i915_perf_init(struct drm_device *dev)
 	dev_priv->perf.oa.n_builtin_sets =
 		i915_oa_n_builtin_metric_sets_hsw;
 
-	dev_priv->perf.oa.oa_formats = hsw_oa_formats;
+	if (i915_perf_init_sysfs_hsw(dev_priv)) {
+		kobject_put(dev_priv->perf.metrics_kobj);
+		dev_priv->perf.metrics_kobj = NULL;
+		return;
+	}
 
 	dev_priv->perf.initialized = true;
+
+	return;
 }
 
 void i915_perf_fini(struct drm_device *dev)
@@ -1199,6 +1210,11 @@ void i915_perf_fini(struct drm_device *dev)
 	if (!dev_priv->perf.initialized)
 		return;
 
+	i915_perf_deinit_sysfs_hsw(dev_priv);
+
+	kobject_put(dev_priv->perf.metrics_kobj);
+	dev_priv->perf.metrics_kobj = NULL;
+
 	dev_priv->perf.oa.ops.init_oa_buffer = NULL;
 
 	dev_priv->perf.initialized = false;
-- 
2.7.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/8] drm/i915: Add dev.i915.perf_event_paranoid sysctl option
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
                   ` (4 preceding siblings ...)
  2016-02-03 18:39 ` [PATCH 5/8] drm/i915: advertise available metrics via sysfs Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-03 18:39 ` [PATCH 7/8] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: David Airlie, dri-devel, Sourab Gupta, Daniel Vetter

Consistent with the kernel.perf_event_paranoid sysctl option that can
allow non-root users to access system wide cpu metrics, this can
optionally allow non-root users to access system wide OA counter metrics
from Gen graphics hardware.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/i915_perf.c | 46 +++++++++++++++++++++++++++++++++++++++-
 2 files changed, 46 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b1c60bb..6776e18 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2087,6 +2087,7 @@ struct drm_i915_private {
 		bool initialized;
 
 		struct kobject *metrics_kobj;
+		struct ctl_table_header *sysctl_header;
 
 		struct mutex lock;
 		struct list_head streams;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 5c08a16..7c27d93 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -37,6 +37,8 @@
 #define POLL_FREQUENCY 200
 #define POLL_PERIOD max_t(u64, 10000, NSEC_PER_SEC / POLL_FREQUENCY)
 
+static u32 i915_perf_stream_paranoid = true;
+
 #define OA_EXPONENT_MAX 0x3f
 
 static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
@@ -956,7 +958,13 @@ int i915_perf_open_ioctl_locked(struct drm_device *dev,
 		}
 	}
 
-	if (!specific_ctx && !capable(CAP_SYS_ADMIN)) {
+	/* Similar to perf's kernel.perf_paranoid_cpu sysctl option
+	 * we check a dev.i915.perf_stream_paranoid sysctl option
+	 * to determine if it's ok to access system wide OA counters
+	 * without CAP_SYS_ADMIN privileges.
+	 */
+	if (!specific_ctx &&
+	    i915_perf_stream_paranoid && !capable(CAP_SYS_ADMIN)) {
 		DRM_ERROR("Insufficient privileges to open system-wide i915 perf stream\n");
 		ret = -EACCES;
 		goto err_ctx;
@@ -1157,6 +1165,38 @@ int i915_perf_open_ioctl(struct drm_device *dev, void *data,
 	return ret;
 }
 
+
+static struct ctl_table oa_table[] = {
+	{
+	 .procname = "perf_stream_paranoid",
+	 .data = &i915_perf_stream_paranoid,
+	 .maxlen = sizeof(i915_perf_stream_paranoid),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec,
+	 },
+	{}
+};
+
+static struct ctl_table i915_root[] = {
+	{
+	 .procname = "i915",
+	 .maxlen = 0,
+	 .mode = 0555,
+	 .child = oa_table,
+	 },
+	{}
+};
+
+static struct ctl_table dev_root[] = {
+	{
+	 .procname = "dev",
+	 .maxlen = 0,
+	 .mode = 0555,
+	 .child = i915_root,
+	 },
+	{}
+};
+
 void i915_perf_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = to_i915(dev);
@@ -1198,6 +1238,8 @@ void i915_perf_init(struct drm_device *dev)
 		return;
 	}
 
+	dev_priv->perf.sysctl_header = register_sysctl_table(dev_root);
+
 	dev_priv->perf.initialized = true;
 
 	return;
@@ -1210,6 +1252,8 @@ void i915_perf_fini(struct drm_device *dev)
 	if (!dev_priv->perf.initialized)
 		return;
 
+	unregister_sysctl_table(dev_priv->perf.sysctl_header);
+
 	i915_perf_deinit_sysfs_hsw(dev_priv);
 
 	kobject_put(dev_priv->perf.metrics_kobj);
-- 
2.7.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/8] drm/i915: add oa_event_min_timer_exponent sysctl
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
                   ` (5 preceding siblings ...)
  2016-02-03 18:39 ` [PATCH 6/8] drm/i915: Add dev.i915.perf_event_paranoid sysctl option Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-03 18:39 ` [PATCH 8/8] drm/i915: Add more Haswell OA metric sets Robert Bragg
  2016-02-04  7:44 ` ✓ Fi.CI.BAT: success for Enable Gen 7 Observation Architecture (rev2) Patchwork
  8 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Sourab Gupta, Daniel Vetter, Robert Bragg

The minimal sampling period is now configurable via a
dev.i915.oa_min_timer_exponent sysctl parameter.

Following the precedent set by perf, the default is the minimum that
won't (on its own) exceed the default kernel.perf_event_max_sample_rate
default of 100000 samples/s.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_perf.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 7c27d93..ca4860b 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -41,6 +41,23 @@ static u32 i915_perf_stream_paranoid = true;
 
 #define OA_EXPONENT_MAX 0x3f
 
+/* for sysctl proc_dointvec_minmax of i915_oa_min_timer_exponent */
+static int zero;
+static int oa_exponent_max = OA_EXPONENT_MAX;
+
+/* Theoretically we can program the OA unit to sample every 160ns but don't
+ * allow that by default unless root...
+ *
+ * The period is derived from the exponent as:
+ *
+ *   period = 80ns * 2^(exponent + 1)
+ *
+ * Referring to perf's kernel.perf_event_max_sample_rate for a precedent
+ * (100000 by default); with an OA exponent of 6 we get a period of 10.240
+ * microseconds - just under 100000Hz
+ */
+static u32 i915_oa_min_timer_exponent = 6;
+
 static struct i915_oa_format hsw_oa_formats[I915_OA_FORMAT_MAX] = {
 	[I915_OA_FORMAT_A13]	    = { 0, 64 },
 	[I915_OA_FORMAT_A29]	    = { 1, 128 },
@@ -1098,20 +1115,12 @@ static int read_properties_unlocked(struct drm_i915_private *dev_priv,
 			if (value > OA_EXPONENT_MAX)
 				return -EINVAL;
 
-			/* NB: The exponent represents a period as follows:
-			 *
-			 *   80ns * 2^(period_exponent + 1)
-			 *
-			 * Theoretically we can program the OA unit to sample
+			/* Theoretically we can program the OA unit to sample
 			 * every 160ns but don't allow that by default unless
 			 * root.
-			 *
-			 * Referring to perf's
-			 * kernel.perf_event_max_sample_rate for a precedent
-			 * (100000 by default); with an OA exponent of 6 we get
-			 * a period of 10.240 microseconds -just under 100000Hz
 			 */
-			if (value < 6 && !capable(CAP_SYS_ADMIN)) {
+			if (value < i915_oa_min_timer_exponent &&
+			    !capable(CAP_SYS_ADMIN)) {
 				DRM_ERROR("Sampling period too high without root privileges\n");
 				return -EACCES;
 			}
@@ -1174,6 +1183,15 @@ static struct ctl_table oa_table[] = {
 	 .mode = 0644,
 	 .proc_handler = proc_dointvec,
 	 },
+	{
+	 .procname = "oa_min_timer_exponent",
+	 .data = &i915_oa_min_timer_exponent,
+	 .maxlen = sizeof(i915_oa_min_timer_exponent),
+	 .mode = 0644,
+	 .proc_handler = proc_dointvec_minmax,
+	 .extra1 = &zero,
+	 .extra2 = &oa_exponent_max,
+	 },
 	{}
 };
 
-- 
2.7.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 8/8] drm/i915: Add more Haswell OA metric sets
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
                   ` (6 preceding siblings ...)
  2016-02-03 18:39 ` [PATCH 7/8] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
@ 2016-02-03 18:39 ` Robert Bragg
  2016-02-04  7:44 ` ✓ Fi.CI.BAT: success for Enable Gen 7 Observation Architecture (rev2) Patchwork
  8 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-03 18:39 UTC (permalink / raw)
  To: intel-gfx; +Cc: dri-devel, Sourab Gupta, Daniel Vetter, Robert Bragg

This adds 'compute', 'compute extended', 'memory reads', 'memory writes'
and 'sampler balance' metric sets for Haswell.

Signed-off-by: Robert Bragg <robert@sixbynine.org>
---
 drivers/gpu/drm/i915/i915_oa_hsw.c | 513 +++++++++++++++++++++++++++++++++++--
 1 file changed, 497 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_oa_hsw.c b/drivers/gpu/drm/i915/i915_oa_hsw.c
index ecd7d2b..4a2de8a 100644
--- a/drivers/gpu/drm/i915/i915_oa_hsw.c
+++ b/drivers/gpu/drm/i915/i915_oa_hsw.c
@@ -30,9 +30,14 @@
 
 enum metric_set_id {
 	METRIC_SET_ID_RENDER_BASIC = 1,
+	METRIC_SET_ID_COMPUTE_BASIC,
+	METRIC_SET_ID_COMPUTE_EXTENDED,
+	METRIC_SET_ID_MEMORY_READS,
+	METRIC_SET_ID_MEMORY_WRITES,
+	METRIC_SET_ID_SAMPLER_BALANCE,
 };
 
-int i915_oa_n_builtin_metric_sets_hsw = 1;
+int i915_oa_n_builtin_metric_sets_hsw = 6;
 
 static const struct i915_oa_reg b_counter_config_render_basic[] = {
 	{ _MMIO(0x2724), 0x00800000 },
@@ -118,6 +123,332 @@ static int select_render_basic_config(struct drm_i915_private *dev_priv)
 	return 0;
 }
 
+static const struct i915_oa_reg b_counter_config_compute_basic[] = {
+	{ _MMIO(0x2710), 0x00000000 },
+	{ _MMIO(0x2714), 0x00800000 },
+	{ _MMIO(0x2718), 0xAAAAAAAA },
+	{ _MMIO(0x271C), 0xAAAAAAAA },
+	{ _MMIO(0x2720), 0x00000000 },
+	{ _MMIO(0x2724), 0x00800000 },
+	{ _MMIO(0x2728), 0xAAAAAAAA },
+	{ _MMIO(0x272C), 0xAAAAAAAA },
+	{ _MMIO(0x2740), 0x00000000 },
+	{ _MMIO(0x2744), 0x00000000 },
+	{ _MMIO(0x2748), 0x00000000 },
+	{ _MMIO(0x274C), 0x00000000 },
+	{ _MMIO(0x2750), 0x00000000 },
+	{ _MMIO(0x2754), 0x00000000 },
+	{ _MMIO(0x2758), 0x00000000 },
+	{ _MMIO(0x275C), 0x00000000 },
+};
+
+static const struct i915_oa_reg mux_config_compute_basic[] = {
+	{ _MMIO(0x253A4), 0x00000000 },
+	{ _MMIO(0x2681C), 0x01F00800 },
+	{ _MMIO(0x26820), 0x00001000 },
+	{ _MMIO(0x2781C), 0x01F00800 },
+	{ _MMIO(0x26520), 0x00000007 },
+	{ _MMIO(0x265A0), 0x00000007 },
+	{ _MMIO(0x25380), 0x00000010 },
+	{ _MMIO(0x2538C), 0x00300000 },
+	{ _MMIO(0x25384), 0xAA8AAAAA },
+	{ _MMIO(0x25404), 0xFFFFFFFF },
+	{ _MMIO(0x26800), 0x00004202 },
+	{ _MMIO(0x26808), 0x00605817 },
+	{ _MMIO(0x2680C), 0x10001005 },
+	{ _MMIO(0x26804), 0x00000000 },
+	{ _MMIO(0x27800), 0x00000102 },
+	{ _MMIO(0x27808), 0x0C0701E0 },
+	{ _MMIO(0x2780C), 0x000200A0 },
+	{ _MMIO(0x27804), 0x00000000 },
+	{ _MMIO(0x26484), 0x44000000 },
+	{ _MMIO(0x26704), 0x44000000 },
+	{ _MMIO(0x26500), 0x00000006 },
+	{ _MMIO(0x26510), 0x00000001 },
+	{ _MMIO(0x26504), 0x88000000 },
+	{ _MMIO(0x26580), 0x00000006 },
+	{ _MMIO(0x26590), 0x00000020 },
+	{ _MMIO(0x26584), 0x00000000 },
+	{ _MMIO(0x26104), 0x55822222 },
+	{ _MMIO(0x26184), 0xAA866666 },
+	{ _MMIO(0x25420), 0x08320C83 },
+	{ _MMIO(0x25424), 0x06820C83 },
+	{ _MMIO(0x2541C), 0x00000000 },
+	{ _MMIO(0x25428), 0x00000C03 },
+};
+
+static int select_compute_basic_config(struct drm_i915_private *dev_priv)
+{
+	dev_priv->perf.oa.mux_regs =
+		mux_config_compute_basic;
+	dev_priv->perf.oa.mux_regs_len =
+		ARRAY_SIZE(mux_config_compute_basic);
+
+	dev_priv->perf.oa.b_counter_regs =
+		b_counter_config_compute_basic;
+	dev_priv->perf.oa.b_counter_regs_len =
+		ARRAY_SIZE(b_counter_config_compute_basic);
+
+	return 0;
+}
+
+static const struct i915_oa_reg b_counter_config_compute_extended[] = {
+	{ _MMIO(0x2724), 0xf0800000 },
+	{ _MMIO(0x2720), 0x00000000 },
+	{ _MMIO(0x2714), 0xf0800000 },
+	{ _MMIO(0x2710), 0x00000000 },
+	{ _MMIO(0x2770), 0x0007fe2a },
+	{ _MMIO(0x2774), 0x0000ff00 },
+	{ _MMIO(0x2778), 0x0007fe6a },
+	{ _MMIO(0x277c), 0x0000ff00 },
+	{ _MMIO(0x2780), 0x0007fe92 },
+	{ _MMIO(0x2784), 0x0000ff00 },
+	{ _MMIO(0x2788), 0x0007fea2 },
+	{ _MMIO(0x278c), 0x0000ff00 },
+	{ _MMIO(0x2790), 0x0007fe32 },
+	{ _MMIO(0x2794), 0x0000ff00 },
+	{ _MMIO(0x2798), 0x0007fe9a },
+	{ _MMIO(0x279c), 0x0000ff00 },
+	{ _MMIO(0x27a0), 0x0007ff23 },
+	{ _MMIO(0x27a4), 0x0000ff00 },
+	{ _MMIO(0x27a8), 0x0007fff3 },
+	{ _MMIO(0x27ac), 0x0000fffe },
+};
+
+static const struct i915_oa_reg mux_config_compute_extended[] = {
+	{ _MMIO(0x2681C), 0x3EB00800 },
+	{ _MMIO(0x26820), 0x00900000 },
+	{ _MMIO(0x25384), 0x02AAAAAA },
+	{ _MMIO(0x25404), 0x03FFFFFF },
+	{ _MMIO(0x26800), 0x00142284 },
+	{ _MMIO(0x26808), 0x0E629062 },
+	{ _MMIO(0x2680C), 0x3F6F55CB },
+	{ _MMIO(0x26810), 0x00000014 },
+	{ _MMIO(0x26804), 0x00000000 },
+	{ _MMIO(0x26104), 0x02AAAAAA },
+	{ _MMIO(0x26184), 0x02AAAAAA },
+	{ _MMIO(0x25420), 0x00000000 },
+	{ _MMIO(0x25424), 0x00000000 },
+	{ _MMIO(0x2541C), 0x00000000 },
+	{ _MMIO(0x25428), 0x00000000 },
+};
+
+static int select_compute_extended_config(struct drm_i915_private *dev_priv)
+{
+	dev_priv->perf.oa.mux_regs =
+		mux_config_compute_extended;
+	dev_priv->perf.oa.mux_regs_len =
+		ARRAY_SIZE(mux_config_compute_extended);
+
+	dev_priv->perf.oa.b_counter_regs =
+		b_counter_config_compute_extended;
+	dev_priv->perf.oa.b_counter_regs_len =
+		ARRAY_SIZE(b_counter_config_compute_extended);
+
+	return 0;
+}
+
+static const struct i915_oa_reg b_counter_config_memory_reads[] = {
+	{ _MMIO(0x2724), 0xf0800000 },
+	{ _MMIO(0x2720), 0x00000000 },
+	{ _MMIO(0x2714), 0xf0800000 },
+	{ _MMIO(0x2710), 0x00000000 },
+	{ _MMIO(0x274c), 0x76543298 },
+	{ _MMIO(0x2748), 0x98989898 },
+	{ _MMIO(0x2744), 0x000000e4 },
+	{ _MMIO(0x2740), 0x00000000 },
+	{ _MMIO(0x275c), 0x98a98a98 },
+	{ _MMIO(0x2758), 0x88888888 },
+	{ _MMIO(0x2754), 0x000c5500 },
+	{ _MMIO(0x2750), 0x00000000 },
+	{ _MMIO(0x2770), 0x0007f81a },
+	{ _MMIO(0x2774), 0x0000fc00 },
+	{ _MMIO(0x2778), 0x0007f82a },
+	{ _MMIO(0x277c), 0x0000fc00 },
+	{ _MMIO(0x2780), 0x0007f872 },
+	{ _MMIO(0x2784), 0x0000fc00 },
+	{ _MMIO(0x2788), 0x0007f8ba },
+	{ _MMIO(0x278c), 0x0000fc00 },
+	{ _MMIO(0x2790), 0x0007f87a },
+	{ _MMIO(0x2794), 0x0000fc00 },
+	{ _MMIO(0x2798), 0x0007f8ea },
+	{ _MMIO(0x279c), 0x0000fc00 },
+	{ _MMIO(0x27a0), 0x0007f8e2 },
+	{ _MMIO(0x27a4), 0x0000fc00 },
+	{ _MMIO(0x27a8), 0x0007f8f2 },
+	{ _MMIO(0x27ac), 0x0000fc00 },
+};
+
+static const struct i915_oa_reg mux_config_memory_reads[] = {
+	{ _MMIO(0x253A4), 0x34300000 },
+	{ _MMIO(0x25440), 0x2D800000 },
+	{ _MMIO(0x25444), 0x00000008 },
+	{ _MMIO(0x25128), 0x0E600000 },
+	{ _MMIO(0x25380), 0x00000450 },
+	{ _MMIO(0x25390), 0x00052C43 },
+	{ _MMIO(0x25384), 0x00000000 },
+	{ _MMIO(0x25400), 0x00006144 },
+	{ _MMIO(0x25408), 0x0A418820 },
+	{ _MMIO(0x2540C), 0x000820E6 },
+	{ _MMIO(0x25404), 0xFF500000 },
+	{ _MMIO(0x25100), 0x000005D6 },
+	{ _MMIO(0x2510C), 0x0EF00000 },
+	{ _MMIO(0x25104), 0x00000000 },
+	{ _MMIO(0x25420), 0x02108421 },
+	{ _MMIO(0x25424), 0x00008421 },
+	{ _MMIO(0x2541C), 0x00000000 },
+	{ _MMIO(0x25428), 0x00000000 },
+};
+
+static int select_memory_reads_config(struct drm_i915_private *dev_priv)
+{
+	dev_priv->perf.oa.mux_regs =
+		mux_config_memory_reads;
+	dev_priv->perf.oa.mux_regs_len =
+		ARRAY_SIZE(mux_config_memory_reads);
+
+	dev_priv->perf.oa.b_counter_regs =
+		b_counter_config_memory_reads;
+	dev_priv->perf.oa.b_counter_regs_len =
+		ARRAY_SIZE(b_counter_config_memory_reads);
+
+	return 0;
+}
+
+static const struct i915_oa_reg b_counter_config_memory_writes[] = {
+	{ _MMIO(0x2724), 0xf0800000 },
+	{ _MMIO(0x2720), 0x00000000 },
+	{ _MMIO(0x2714), 0xf0800000 },
+	{ _MMIO(0x2710), 0x00000000 },
+	{ _MMIO(0x274c), 0x76543298 },
+	{ _MMIO(0x2748), 0x98989898 },
+	{ _MMIO(0x2744), 0x000000e4 },
+	{ _MMIO(0x2740), 0x00000000 },
+	{ _MMIO(0x275c), 0xbabababa },
+	{ _MMIO(0x2758), 0x88888888 },
+	{ _MMIO(0x2754), 0x000c5500 },
+	{ _MMIO(0x2750), 0x00000000 },
+	{ _MMIO(0x2770), 0x0007f81a },
+	{ _MMIO(0x2774), 0x0000fc00 },
+	{ _MMIO(0x2778), 0x0007f82a },
+	{ _MMIO(0x277c), 0x0000fc00 },
+	{ _MMIO(0x2780), 0x0007f822 },
+	{ _MMIO(0x2784), 0x0000fc00 },
+	{ _MMIO(0x2788), 0x0007f8ba },
+	{ _MMIO(0x278c), 0x0000fc00 },
+	{ _MMIO(0x2790), 0x0007f87a },
+	{ _MMIO(0x2794), 0x0000fc00 },
+	{ _MMIO(0x2798), 0x0007f8ea },
+	{ _MMIO(0x279c), 0x0000fc00 },
+	{ _MMIO(0x27a0), 0x0007f8e2 },
+	{ _MMIO(0x27a4), 0x0000fc00 },
+	{ _MMIO(0x27a8), 0x0007f8f2 },
+	{ _MMIO(0x27ac), 0x0000fc00 },
+};
+
+static const struct i915_oa_reg mux_config_memory_writes[] = {
+	{ _MMIO(0x253A4), 0x34300000 },
+	{ _MMIO(0x25440), 0x01500000 },
+	{ _MMIO(0x25444), 0x00000120 },
+	{ _MMIO(0x25128), 0x0C200000 },
+	{ _MMIO(0x25380), 0x00000450 },
+	{ _MMIO(0x25390), 0x00052C43 },
+	{ _MMIO(0x25384), 0x00000000 },
+	{ _MMIO(0x25400), 0x00007184 },
+	{ _MMIO(0x25408), 0x0A418820 },
+	{ _MMIO(0x2540C), 0x000820E6 },
+	{ _MMIO(0x25404), 0xFF500000 },
+	{ _MMIO(0x25100), 0x000005D6 },
+	{ _MMIO(0x2510C), 0x1E700000 },
+	{ _MMIO(0x25104), 0x00000000 },
+	{ _MMIO(0x25420), 0x02108421 },
+	{ _MMIO(0x25424), 0x00008421 },
+	{ _MMIO(0x2541C), 0x00000000 },
+	{ _MMIO(0x25428), 0x00000000 },
+};
+
+static int select_memory_writes_config(struct drm_i915_private *dev_priv)
+{
+	dev_priv->perf.oa.mux_regs =
+		mux_config_memory_writes;
+	dev_priv->perf.oa.mux_regs_len =
+		ARRAY_SIZE(mux_config_memory_writes);
+
+	dev_priv->perf.oa.b_counter_regs =
+		b_counter_config_memory_writes;
+	dev_priv->perf.oa.b_counter_regs_len =
+		ARRAY_SIZE(b_counter_config_memory_writes);
+
+	return 0;
+}
+
+static const struct i915_oa_reg b_counter_config_sampler_balance[] = {
+	{ _MMIO(0x2740), 0x00000000 },
+	{ _MMIO(0x2744), 0x00800000 },
+	{ _MMIO(0x2710), 0x00000000 },
+	{ _MMIO(0x2714), 0x00800000 },
+	{ _MMIO(0x2720), 0x00000000 },
+	{ _MMIO(0x2724), 0x00800000 },
+};
+
+static const struct i915_oa_reg mux_config_sampler_balance[] = {
+	{ _MMIO(0x2eb9c), 0x01906400 },
+	{ _MMIO(0x2fb9c), 0x01906400 },
+	{ _MMIO(0x253a4), 0x00000000 },
+	{ _MMIO(0x26b9c), 0x01906400 },
+	{ _MMIO(0x27b9c), 0x01906400 },
+	{ _MMIO(0x27104), 0x00a00000 },
+	{ _MMIO(0x27184), 0x00a50000 },
+	{ _MMIO(0x2e804), 0x00500000 },
+	{ _MMIO(0x2e984), 0x00500000 },
+	{ _MMIO(0x2eb04), 0x00500000 },
+	{ _MMIO(0x2eb80), 0x00000084 },
+	{ _MMIO(0x2eb8c), 0x14200000 },
+	{ _MMIO(0x2eb84), 0x00000000 },
+	{ _MMIO(0x2f804), 0x00050000 },
+	{ _MMIO(0x2f984), 0x00050000 },
+	{ _MMIO(0x2fb04), 0x00050000 },
+	{ _MMIO(0x2fb80), 0x00000084 },
+	{ _MMIO(0x2fb8c), 0x00050800 },
+	{ _MMIO(0x2fb84), 0x00000000 },
+	{ _MMIO(0x25380), 0x00000010 },
+	{ _MMIO(0x2538c), 0x000000c0 },
+	{ _MMIO(0x25384), 0xaa550000 },
+	{ _MMIO(0x25404), 0xffffc000 },
+	{ _MMIO(0x26804), 0x50000000 },
+	{ _MMIO(0x26984), 0x50000000 },
+	{ _MMIO(0x26b04), 0x50000000 },
+	{ _MMIO(0x26b80), 0x00000084 },
+	{ _MMIO(0x26b90), 0x00050800 },
+	{ _MMIO(0x26b84), 0x00000000 },
+	{ _MMIO(0x27804), 0x05000000 },
+	{ _MMIO(0x27984), 0x05000000 },
+	{ _MMIO(0x27b04), 0x05000000 },
+	{ _MMIO(0x27b80), 0x00000084 },
+	{ _MMIO(0x27b90), 0x00000142 },
+	{ _MMIO(0x27b84), 0x00000000 },
+	{ _MMIO(0x26104), 0xa0000000 },
+	{ _MMIO(0x26184), 0xa5000000 },
+	{ _MMIO(0x25424), 0x00008620 },
+	{ _MMIO(0x2541c), 0x00000000 },
+	{ _MMIO(0x25428), 0x0004a54a },
+};
+
+static int select_sampler_balance_config(struct drm_i915_private *dev_priv)
+{
+	dev_priv->perf.oa.mux_regs =
+		mux_config_sampler_balance;
+	dev_priv->perf.oa.mux_regs_len =
+		ARRAY_SIZE(mux_config_sampler_balance);
+
+	dev_priv->perf.oa.b_counter_regs =
+		b_counter_config_sampler_balance;
+	dev_priv->perf.oa.b_counter_regs_len =
+		ARRAY_SIZE(b_counter_config_sampler_balance);
+
+	return 0;
+}
+
 int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv)
 {
 	dev_priv->perf.oa.mux_regs = NULL;
@@ -128,6 +459,16 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv)
 	switch (dev_priv->perf.oa.metrics_set) {
 	case METRIC_SET_ID_RENDER_BASIC:
 		return select_render_basic_config(dev_priv);
+	case METRIC_SET_ID_COMPUTE_BASIC:
+		return select_compute_basic_config(dev_priv);
+	case METRIC_SET_ID_COMPUTE_EXTENDED:
+		return select_compute_extended_config(dev_priv);
+	case METRIC_SET_ID_MEMORY_READS:
+		return select_memory_reads_config(dev_priv);
+	case METRIC_SET_ID_MEMORY_WRITES:
+		return select_memory_writes_config(dev_priv);
+	case METRIC_SET_ID_SAMPLER_BALANCE:
+		return select_sampler_balance_config(dev_priv);
 	default:
 		return -ENODEV;
 	}
@@ -136,42 +477,182 @@ int i915_oa_select_metric_set_hsw(struct drm_i915_private *dev_priv)
 static ssize_t
 show_render_basic_id(struct device *kdev, struct device_attribute *attr, char *buf)
 {
-        return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
+	return sprintf(buf, "%d\n", METRIC_SET_ID_RENDER_BASIC);
 }
 
 static struct device_attribute dev_attr_render_basic_id = {
-        .attr = { .name = "id", .mode = S_IRUGO },
-        .show = show_render_basic_id,
-        .store = NULL,
+	.attr = { .name = "id", .mode = S_IRUGO },
+	.show = show_render_basic_id,
+	.store = NULL,
 };
 
 static struct attribute *attrs_render_basic[] = {
-        &dev_attr_render_basic_id.attr,
-        NULL,
+	&dev_attr_render_basic_id.attr,
+	NULL,
 };
 
 static struct attribute_group group_render_basic = {
-        .name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
-        .attrs =  attrs_render_basic,
+	.name = "403d8832-1a27-4aa6-a64e-f5389ce7b212",
+	.attrs =  attrs_render_basic,
+};
+
+static ssize_t
+show_compute_basic_id(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", METRIC_SET_ID_COMPUTE_BASIC);
+}
+
+static struct device_attribute dev_attr_compute_basic_id = {
+	.attr = { .name = "id", .mode = S_IRUGO },
+	.show = show_compute_basic_id,
+	.store = NULL,
+};
+
+static struct attribute *attrs_compute_basic[] = {
+	&dev_attr_compute_basic_id.attr,
+	NULL,
+};
+
+static struct attribute_group group_compute_basic = {
+	.name = "39ad14bc-2380-45c4-91eb-fbcb3aa7ae7b",
+	.attrs =  attrs_compute_basic,
+};
+
+static ssize_t
+show_compute_extended_id(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", METRIC_SET_ID_COMPUTE_EXTENDED);
+}
+
+static struct device_attribute dev_attr_compute_extended_id = {
+	.attr = { .name = "id", .mode = S_IRUGO },
+	.show = show_compute_extended_id,
+	.store = NULL,
+};
+
+static struct attribute *attrs_compute_extended[] = {
+	&dev_attr_compute_extended_id.attr,
+	NULL,
+};
+
+static struct attribute_group group_compute_extended = {
+	.name = "3865be28-6982-49fe-9494-e4d1b4795413",
+	.attrs =  attrs_compute_extended,
+};
+
+static ssize_t
+show_memory_reads_id(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", METRIC_SET_ID_MEMORY_READS);
+}
+
+static struct device_attribute dev_attr_memory_reads_id = {
+	.attr = { .name = "id", .mode = S_IRUGO },
+	.show = show_memory_reads_id,
+	.store = NULL,
+};
+
+static struct attribute *attrs_memory_reads[] = {
+	&dev_attr_memory_reads_id.attr,
+	NULL,
+};
+
+static struct attribute_group group_memory_reads = {
+	.name = "bb5ed49b-2497-4095-94f6-26ba294db88a",
+	.attrs =  attrs_memory_reads,
+};
+
+static ssize_t
+show_memory_writes_id(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", METRIC_SET_ID_MEMORY_WRITES);
+}
+
+static struct device_attribute dev_attr_memory_writes_id = {
+	.attr = { .name = "id", .mode = S_IRUGO },
+	.show = show_memory_writes_id,
+	.store = NULL,
+};
+
+static struct attribute *attrs_memory_writes[] = {
+	&dev_attr_memory_writes_id.attr,
+	NULL,
+};
+
+static struct attribute_group group_memory_writes = {
+	.name = "3358d639-9b5f-45ab-976d-9b08cbfc6240",
+	.attrs =  attrs_memory_writes,
+};
+
+static ssize_t
+show_sampler_balance_id(struct device *kdev, struct device_attribute *attr, char *buf)
+{
+	return sprintf(buf, "%d\n", METRIC_SET_ID_SAMPLER_BALANCE);
+}
+
+static struct device_attribute dev_attr_sampler_balance_id = {
+	.attr = { .name = "id", .mode = S_IRUGO },
+	.show = show_sampler_balance_id,
+	.store = NULL,
+};
+
+static struct attribute *attrs_sampler_balance[] = {
+	&dev_attr_sampler_balance_id.attr,
+	NULL,
+};
+
+static struct attribute_group group_sampler_balance = {
+	.name = "bc274488-b4b6-40c7-90da-b77d7ad16189",
+	.attrs =  attrs_sampler_balance,
 };
 
 int
 i915_perf_init_sysfs_hsw(struct drm_i915_private *dev_priv)
 {
-        int ret;
+	int ret;
 
-        ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic);
-        if (ret)
-                goto error_render_basic;
+	ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_render_basic);
+	if (ret)
+		goto error_render_basic;
+	ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_compute_basic);
+	if (ret)
+		goto error_compute_basic;
+	ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_compute_extended);
+	if (ret)
+		goto error_compute_extended;
+	ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_memory_reads);
+	if (ret)
+		goto error_memory_reads;
+	ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_memory_writes);
+	if (ret)
+		goto error_memory_writes;
+	ret = sysfs_create_group(dev_priv->perf.metrics_kobj, &group_sampler_balance);
+	if (ret)
+		goto error_sampler_balance;
 
-        return 0;
+	return 0;
 
+error_sampler_balance:
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_memory_writes);
+error_memory_writes:
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_memory_reads);
+error_memory_reads:
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_compute_extended);
+error_compute_extended:
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_compute_basic);
+error_compute_basic:
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic);
 error_render_basic:
-        return ret;
+	return ret;
 }
 
 void
 i915_perf_deinit_sysfs_hsw(struct drm_i915_private *dev_priv)
 {
-        sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic);
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_render_basic);
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_compute_basic);
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_compute_extended);
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_memory_reads);
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_memory_writes);
+	sysfs_remove_group(dev_priv->perf.metrics_kobj, &group_sampler_balance);
 }
-- 
2.7.0

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] drm/i915: Add i915 perf infrastructure
  2016-02-03 18:39 ` [PATCH 1/8] drm/i915: Add i915 perf infrastructure Robert Bragg
@ 2016-02-04  1:42   ` Emil Velikov
  2016-02-04 13:17     ` Robert Bragg
  0 siblings, 1 reply; 14+ messages in thread
From: Emil Velikov @ 2016-02-04  1:42 UTC (permalink / raw)
  To: Robert Bragg; +Cc: Daniel Vetter, intel-gfx, Sourab Gupta, ML dri-devel

On 3 February 2016 at 18:39, Robert Bragg <robert@sixbynine.org> wrote:

> index a5524cc..68ca26e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h

> @@ -1170,4 +1172,71 @@ struct drm_i915_gem_context_param {
>         __u64 value;
>  };
>
> +#define I915_PERF_FLAG_FD_CLOEXEC      (1<<0)
> +#define I915_PERF_FLAG_FD_NONBLOCK     (1<<1)
> +#define I915_PERF_FLAG_DISABLED                (1<<2)
> +
> +enum drm_i915_perf_property_id {
> +       /**
> +        * Open the stream for a specific context handle (as used with
> +        * execbuffer2). A stream opened for a specific context this way
> +        * won't typically require root privileges.
> +        */
> +       DRM_I915_PERF_CTX_HANDLE_PROP = 1,
> +
Wouldn't DRM_I915_PERF_PROP_CTX_HANDLE be a better name ? It's more
obvious at least :-P

> +       DRM_I915_PERF_PROP_MAX /* non-ABI */
Isn't the use of enums (in drm UAPI) discouraged ? Afaics only the old
DRI1 i915 code has them.
Same question applies throughout the patch/series.

> +};
> +
> +struct drm_i915_perf_open_param {
> +       /** CLOEXEC, NONBLOCK... */
Some other places in i915 define the macros just after the __u32
flags. This will allow you to drop the comment ;-)

> +       __u32 flags;
> +
And ... we broke 32 bit userspare on 64 bit kernels. Please check for
holes and other issues as described in Daniel Vetter's
article/documentation [1]

[1] https://www.kernel.org/doc/Documentation/ioctl/botching-up-ioctls.txt

> +       /**
> +        * Pointer to array of u64 (id, value) pairs configuring the stream
> +        * to open.
> +        */
> +       __u64 __user properties;
> +
> +       /** The number of u64 (id, value) pairs */
> +       __u32 n_properties;
The rest of the file uses num_foo or foo_count. Can we opt for one of those ?

> +};
> +
> +#define I915_PERF_IOCTL_ENABLE _IO('i', 0x0)
> +#define I915_PERF_IOCTL_DISABLE        _IO('i', 0x1)
> +
> +/**
> + * Common to all i915 perf records
> + */
> +struct drm_i915_perf_record_header {
> +       __u32 type;
> +       __u16 pad;
Move pad at the end ? This way we can (maybe?) reuse it in the future.

> +       __u16 size;
> +};
> +

Daniel, is there a consensus about zeroing the pad from userspace and
checking it for garbage in the ioctl ? The documentation says "do it",
yet I have a hunch that we're missing a few. Also what error should
the ioctl return in such a case - EINVAL or EFAULT ? Worth
documenting, just in case ?

-Emil
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* ✓ Fi.CI.BAT: success for Enable Gen 7 Observation Architecture (rev2)
  2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
                   ` (7 preceding siblings ...)
  2016-02-03 18:39 ` [PATCH 8/8] drm/i915: Add more Haswell OA metric sets Robert Bragg
@ 2016-02-04  7:44 ` Patchwork
  8 siblings, 0 replies; 14+ messages in thread
From: Patchwork @ 2016-02-04  7:44 UTC (permalink / raw)
  To: Robert Bragg; +Cc: intel-gfx

== Summary ==

Series 3024v2 Enable Gen 7 Observation Architecture
http://patchwork.freedesktop.org/api/1.0/series/3024/revisions/2/mbox/


bdw-nuci7        total:156  pass:147  dwarn:0   dfail:0   fail:0   skip:9  
bdw-ultra        total:159  pass:147  dwarn:0   dfail:0   fail:0   skip:12 
bsw-nuc-2        total:159  pass:131  dwarn:0   dfail:0   fail:0   skip:28 
byt-nuc          total:159  pass:136  dwarn:0   dfail:0   fail:0   skip:23 
hsw-brixbox      total:159  pass:146  dwarn:0   dfail:0   fail:0   skip:13 
hsw-gt2          total:159  pass:149  dwarn:0   dfail:0   fail:0   skip:10 
ilk-hp8440p      total:159  pass:111  dwarn:0   dfail:0   fail:0   skip:48 
ivb-t430s        total:159  pass:145  dwarn:0   dfail:0   fail:0   skip:14 
skl-i5k-2        total:159  pass:144  dwarn:1   dfail:0   fail:0   skip:14 
skl-i7k-2        total:159  pass:144  dwarn:1   dfail:0   fail:0   skip:14 
snb-dellxps      total:159  pass:137  dwarn:0   dfail:0   fail:0   skip:22 

Results at /archive/results/CI_IGT_test/Patchwork_1359/

1ff67c58aa3d7cacd37451397c740b8df27994e6 drm-intel-nightly: 2016y-02m-03d-18h-22m-38s UTC integration manifest
302f43ab1f73bfa02331cf48d09c0a4633a09a2a drm/i915: Add more Haswell OA metric sets
2d51257b0c7989ed1dae829ce4fe2b737342fbb3 drm/i915: add oa_event_min_timer_exponent sysctl
e25850343facd37fa0db9e1f564fc4a1aa2d04a9 drm/i915: Add dev.i915.perf_event_paranoid sysctl option
4068269d6f53b90ef589684279714086887d1070 drm/i915: advertise available metrics via sysfs
9541e82d530e05442ce06e52a608545003edd8c2 drm/i915: Add i915 perf event for Haswell OA unit
730c581d81a0588abb81d846069d614f489548a9 drm/i915: Add 'render basic' Haswell OA unit config
de7203e5abe978328f41dcad28f96b306aacf684 drm/i915: rename OACONTROL GEN7_OACONTROL
d73e9b28435b554910bf1449f5261945b482bac7 drm/i915: Add i915 perf infrastructure

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] drm/i915: Add i915 perf infrastructure
  2016-02-04  1:42   ` Emil Velikov
@ 2016-02-04 13:17     ` Robert Bragg
  2016-02-04 14:36       ` Robert Bragg
  0 siblings, 1 reply; 14+ messages in thread
From: Robert Bragg @ 2016-02-04 13:17 UTC (permalink / raw)
  To: Emil Velikov; +Cc: Daniel Vetter, intel-gfx, Sourab Gupta, ML dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 4761 bytes --]

On Thu, Feb 4, 2016 at 1:42 AM, Emil Velikov <emil.l.velikov@gmail.com>
wrote:

> On 3 February 2016 at 18:39, Robert Bragg <robert@sixbynine.org> wrote:
>
> > index a5524cc..68ca26e 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
>
> > @@ -1170,4 +1172,71 @@ struct drm_i915_gem_context_param {
> >         __u64 value;
> >  };
> >
> > +#define I915_PERF_FLAG_FD_CLOEXEC      (1<<0)
> > +#define I915_PERF_FLAG_FD_NONBLOCK     (1<<1)
> > +#define I915_PERF_FLAG_DISABLED                (1<<2)
> > +
> > +enum drm_i915_perf_property_id {
> > +       /**
> > +        * Open the stream for a specific context handle (as used with
> > +        * execbuffer2). A stream opened for a specific context this way
> > +        * won't typically require root privileges.
> > +        */
> > +       DRM_I915_PERF_CTX_HANDLE_PROP = 1,
> > +
> Wouldn't DRM_I915_PERF_PROP_CTX_HANDLE be a better name ? It's more
> obvious at least :-P
>

Yeah would be more consistent to keep the common namespacing at the front.


>
> > +       DRM_I915_PERF_PROP_MAX /* non-ABI */
> Isn't the use of enums (in drm UAPI) discouraged ? Afaics only the old
> DRI1 i915 code has them.
> Same question applies throughout the patch/series.
>

An enum here seems like a pretty good technical fit. I think the potential
for different enums to have different sizes is a likely reason for being
cautious about using them as part of a kernel ABI (so probably unwise to
have enum based members in an ioctl structure) but in this case these IDs
are always passed to the kernel as a u64. We benefit from the compiler
reminding us to handle all properties in the driver if we switch on this
enum which I like, as well as having the _MAX value to refer to in the
driver which is perhaps a little more reliable at the end of an enum vs a
#define which needs to be explicitly updated for each addition.

For reference, the first iteration of this driver was based on the core
pref infrastructure and in this case the enum + _MAX /* non-ABI */ approach
is borrowed from there (ref: include/uapi/linux/perf_event.h)


> > +};
> > +
> > +struct drm_i915_perf_open_param {
> > +       /** CLOEXEC, NONBLOCK... */
> Some other places in i915 define the macros just after the __u32
> flags. This will allow you to drop the comment ;-)
>
> > +       __u32 flags;
> > +
> And ... we broke 32 bit userspare on 64 bit kernels. Please check for
> holes and other issues as described in Daniel Vetter's
> article/documentation [1]
>
> [1] https://www.kernel.org/doc/Documentation/ioctl/botching-up-ioctls.txt


Ah yeah, don't think this would break 32bit userspace, but still would be
good to remove that hole, this has been through a few iterations and there
used to be a __u32 type here, well spotted thanks.

I think I'll bump the flags to be 64bit.


>
>
> > +       /**
> > +        * Pointer to array of u64 (id, value) pairs configuring the
> stream
> > +        * to open.
> > +        */
> > +       __u64 __user properties;
> > +
> > +       /** The number of u64 (id, value) pairs */
> > +       __u32 n_properties;
> The rest of the file uses num_foo or foo_count. Can we opt for one of
> those ?
>

Ah yep, num_properties looks good to me.


>
> > +};
> > +
> > +#define I915_PERF_IOCTL_ENABLE _IO('i', 0x0)
> > +#define I915_PERF_IOCTL_DISABLE        _IO('i', 0x1)
> > +
> > +/**
> > + * Common to all i915 perf records
> > + */
> > +struct drm_i915_perf_record_header {
> > +       __u32 type;
> > +       __u16 pad;
> Move pad at the end ? This way we can (maybe?) reuse it in the future.
>

This header was originally based on struct perf_event_header which is layed
out with the padding in the middle too. I can't currently think of a strong
reason to maintain a record header that's ABI compatible with perf, but
/maybe/ it could be convenient in some case to have a common layout for
profiling tools that deal with both. I think we can consider it reserved
for future use either way.


>
> > +       __u16 size;
> > +};
> > +
>
> Daniel, is there a consensus about zeroing the pad from userspace and
> checking it for garbage in the ioctl ? The documentation says "do it",
> yet I have a hunch that we're missing a few. Also what error should
> the ioctl return in such a case - EINVAL or EFAULT ? Worth
> documenting, just in case ?
>

In case you're wondering about the padding in the header above, note that
this header never gets passed in from userspace, it's a header for data
read by userspace and the driver zeros the padding.

For the hole in the ioctl I'll probably fill that by extending the flags
member and the driver currently returns -EINVAL if any unknown flag bits
are set by userspace.


Thanks for looking through this,
- Robert


>
> -Emil
>

[-- Attachment #1.2: Type: text/html, Size: 7024 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/8] drm/i915: Add i915 perf infrastructure
  2016-02-04 13:17     ` Robert Bragg
@ 2016-02-04 14:36       ` Robert Bragg
  0 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-04 14:36 UTC (permalink / raw)
  To: Emil Velikov; +Cc: Daniel Vetter, intel-gfx, Sourab Gupta, ML dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 1136 bytes --]

On Thu, Feb 4, 2016 at 1:17 PM, Robert Bragg <robert@sixbynine.org> wrote:

>
> On Thu, Feb 4, 2016 at 1:42 AM, Emil Velikov <emil.l.velikov@gmail.com>
> wrote:
>
>> On 3 February 2016 at 18:39, Robert Bragg <robert@sixbynine.org> wrote:
>>
>>>
>>> > +};
>>> > +
>>> > +struct drm_i915_perf_open_param {
>>> > +       /** CLOEXEC, NONBLOCK... */
>>> > +       __u32 flags;
>>> > +
>>> And ... we broke 32 bit userspare on 64 bit kernels. Please check for
>>> holes and other issues as described in Daniel Vetter's
>>> article/documentation [1]
>>>
>>> [1]
>>> https://www.kernel.org/doc/Documentation/ioctl/botching-up-ioctls.txt
>>
>>
>> Ah yeah, don't think this would break 32bit userspace, but still would be
>> good to remove that hole, this has been through a few iterations and there
>> used to be a __u32 type here, well spotted thanks.
>>
>> I think I'll bump the flags to be 64bit.
>>
>>
>
Actually, just noticed that since the structure has a u32 hole at the end
too I can move the trailing u32 num_properties up into here instead.

Am also renaming properties to properties_ptr which seems the norm in
i915_drm.h.

- Robert

[-- Attachment #1.2: Type: text/html, Size: 2242 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 0/8] Enable Gen 7 Observation Architecture
@ 2016-02-02 21:30 Robert Bragg
  0 siblings, 0 replies; 14+ messages in thread
From: Robert Bragg @ 2016-02-02 21:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: David Airlie, dri-devel, Sourab Gupta, Daniel Vetter

This series enables support for the Observation Architecture on Haswell

Compared to the last series I sent out, the main changes are:

* The terminology changed so we open a 'stream' not an 'event'.

* A stream is configured with an array of u64 properties after finding
  the previous config struct quite awkward to validate and not well
  suited to supporting multiple kinds of streams sharing common config
  options (which arises with work Sourab Gupta is looking at based on
  this driver)

* No pre-defined enum of metric set IDs and metric sets are now
  advertised via sysfs. This is to allow for configs being loaded at
  runtime in the future and the use of uuids that can be unambiguously
  mapped to corresponding normalization equations in userspace. Another
  consideration here was that we might need to add new versions of a
  particular config like 'compute basic' (more likely for newer configs
  with less testing) at which point the enum doesn't really make life
  easier for userspace if you have to be aware of the versioning
  relationships and attempt different IDs to find the latest version the
  kernel supports.

I haven't written igt tests for this yet, but can hopefully start
looking at that next if there's no major issue with the general api. So
far the interface has been tested and iterated based on what's worked
out well for GPU Top[1], an implementation of GL_INTEL_performance_query
in Mesa and an implementation of an API called MDAPI used by GPA/VTune
to capture metrics (private a.t.m as I'm not sure there would be much
general interest in it.)

The PRM for Haswell's OA unit can be found here:

https://01.org/sites/default/files/documentation/
observability_performance_counters_haswell.pdf

(though unfortunately it's not very complete documentation)

An example of opening an i915 perf stream can be seen in Mesa here:

https://github.com/rib/mesa/blob/wip/rib/oa-next/src/mesa/drivers/
dri/i965/brw_performance_query.c#L904

A summary of the stream format can seen here:

https://github.com/rib/linux/wiki/i915-perf-stream-format

There's an open issue to do with how to properly clear the status bits
in the OASTATUS1 register which I noticed the other day can't be right
a.t.m and I'm waiting for feedback on from the Windows team or OA
architect. I don't expect it will have much impact so it still seemed
worth sending this series now.

Another thing to mention is that I just noticed gputop is hitting the
cpu more than expected in its overview mode, polling for OA data and I
need to double check it's not a driver issue. Hopefully a minor issue;
probably not worth holding things up for.

Although this series only adds basic infrastructure and enables Haswell
it's probably good to keep in mind other work that builds on top. I've
also enabled Gen8+; there's some further work by Sourab Gupta to access
metrics in sync with command stream processing and Matthew Auld has
experimented with an interface for registering new OA configurations at
runtime.

In case it's helpful to reference, the Gen 8 enabling can be seen here:
https://github.com/rib/linux/tree/wip/rib/oa-next

Sourab's work can be seen here:
https://github.com/sourabgu/linux/tree/rebase-4.4-testing

Matt's patch for runtime configs can be seen here:
https://github.com/matt-auld/linux/tree/wip/matt-auld/oa-4.4-dynamic-testing

Regards,
- Robert

[1]
https://github.com/rib/gputop
https://github.com/rib/gputop/wiki/Build-Instructions


Robert Bragg (8):
  drm/i915: Add i915 perf infrastructure
  drm/i915: rename OACONTROL GEN7_OACONTROL
  drm/i915: Add 'render basic' Haswell OA unit config
  drm/i915: Add i915 perf event for Haswell OA unit
  drm/i915: advertise available metrics via sysfs
  drm/i915: Add dev.i915.perf_event_paranoid sysctl option
  drm/i915: add oa_event_min_timer_exponent sysctl
  drm/i915: Add more Haswell OA metric sets

 drivers/gpu/drm/i915/Makefile           |    4 +
 drivers/gpu/drm/i915/i915_cmd_parser.c  |    4 +-
 drivers/gpu/drm/i915/i915_dma.c         |    7 +
 drivers/gpu/drm/i915/i915_drv.h         |  154 ++++
 drivers/gpu/drm/i915/i915_gem_context.c |   23 +-
 drivers/gpu/drm/i915/i915_oa_hsw.c      |  658 ++++++++++++++++
 drivers/gpu/drm/i915/i915_oa_hsw.h      |   38 +
 drivers/gpu/drm/i915/i915_perf.c        | 1283 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h         |  340 +++++++-
 include/uapi/drm/i915_drm.h             |  121 +++
 10 files changed, 2625 insertions(+), 7 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.c
 create mode 100644 drivers/gpu/drm/i915/i915_oa_hsw.h
 create mode 100644 drivers/gpu/drm/i915/i915_perf.c

-- 
2.7.0

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2016-02-04 14:36 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-02-03 18:39 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg
2016-02-03 18:39 ` [PATCH 1/8] drm/i915: Add i915 perf infrastructure Robert Bragg
2016-02-04  1:42   ` Emil Velikov
2016-02-04 13:17     ` Robert Bragg
2016-02-04 14:36       ` Robert Bragg
2016-02-03 18:39 ` [PATCH 2/8] drm/i915: rename OACONTROL GEN7_OACONTROL Robert Bragg
2016-02-03 18:39 ` [PATCH 3/8] drm/i915: Add 'render basic' Haswell OA unit config Robert Bragg
2016-02-03 18:39 ` [PATCH 4/8] drm/i915: Add i915 perf event for Haswell OA unit Robert Bragg
2016-02-03 18:39 ` [PATCH 5/8] drm/i915: advertise available metrics via sysfs Robert Bragg
2016-02-03 18:39 ` [PATCH 6/8] drm/i915: Add dev.i915.perf_event_paranoid sysctl option Robert Bragg
2016-02-03 18:39 ` [PATCH 7/8] drm/i915: add oa_event_min_timer_exponent sysctl Robert Bragg
2016-02-03 18:39 ` [PATCH 8/8] drm/i915: Add more Haswell OA metric sets Robert Bragg
2016-02-04  7:44 ` ✓ Fi.CI.BAT: success for Enable Gen 7 Observation Architecture (rev2) Patchwork
  -- strict thread matches above, loose matches on Subject: below --
2016-02-02 21:30 [PATCH 0/8] Enable Gen 7 Observation Architecture Robert Bragg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.