All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/13] drm-intel-collector - review request
@ 2013-09-23 20:33 Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 01/13] drm/i915: check that the i965g/gm 4G limit is really obeyed Rodrigo Vivi
                   ` (13 more replies)
  0 siblings, 14 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

This is another drm-intel-collector push for review:
http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=drm-intel-collector

Here goes the list in order for better reviewers assignment:

Patch 01/13 - 816e102 drm/i915: check that the i965g/gm 4G limit is really obeyed - Reviewer: Damien
Patch 02/13 - 036f5da drm/i915: Move the conditional seqno query into the tracepoint - Reviewer: Ville
Patch 03/13 - f381b05 drm/i915: Add some missing steps to i915_driver_load error path - Reviewer: Ben
Patch 04/13 - f648dba drm/i915: Asynchronously perform the set-base for a simple modeset - Reviewer: 
Patch 05/13 - ec31a06 drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT - Reviewer: 
Patch 06/13 - 3b2a43a drm/i915: Pair seqno completion tracepoint with its dispatch - Reviewer: 
Patch 07/13 - 4d98ddd drm/i915: Initialise min/max frequencies before updating RPS registers - Reviewer: 
Patch 08/13 - 35983bd drm/i915: Delay the relase of the forcewake by a jiffie - Reviewer: 
Patch 09/13 - 14141ee drm/i915: Add a tracepoint for using a semaphore - Reviewer: 
Patch 10/13 - 06d2851 drm/i915: Boost RPS frequency for CPU stalls - Reviewer: 
Patch 11/13 - e64dce9 drm/i915: Tweak RPS thresholds to more aggressively downclock - Reviewer: 
Patch 12/13 - 7b6c68c drm/i915: Allow GT3 Slice Shutdown on Boot. - Reviewer: 
Patch 13/13 - 2f3c359 drm/i915: Allow Dynamically GT3 Slice Shutdown. - Reviewer: 

Overall Process has changed a bit in order to avoid discussion split/duplications and for poking reviewers:

1. Daniel pushs drm-intel-testing (every 2 weeks)
2. I rebase drm-intel-collector onto drm-intel-testing
3. Add Reviewer: tag with voluntered reviewers. If you don't believe you should be assigned on a particular patch please don't get mad just tell you wont review or volunteer someone else.
4. I resubmit remaining patches for review without picking any new (drm-intel-collector - review request)
5. One week later I collect all simple (1-2) patches that wasn't yet reviewed and not queued by Daniel from one testing update until another.
6. Request automated QA's PRTS automated i-g-t tests comparing drm-intel-testing x drm-intel-collector
7. If tests are ok I send the update notification or the patches as a series to intel-gfx mailing list for better tracking and to be reviewed. (drm-intel-collector - updated)
8. Let me know volunteers for review new patches and also let me know if I've picked any patch that I shouldn't.

There are some reasons that some patches can be left behind:
1. It was send so long time ago. I started with patches from Jul 26th.
2. Your patch didn't applied cleanly and I couldn't easily solve the conflicts.
3. Kernel didn't compiled with your patch.
4. I simply missed it. If you believe this is the case please warn me.

Please help me to get these patches reviewed and queued by Daniel.

Also, please let me know if you have further ideas how to improve this process.

Thanks in advance,
Rodrigo.

Chris Wilson (10):
  drm/i915: Move the conditional seqno query into the tracepoint
  drm/i915: Add some missing steps to i915_driver_load error path
  drm/i915: Asynchronously perform the set-base for a simple modeset
  drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT
  drm/i915: Pair seqno completion tracepoint with its dispatch
  drm/i915: Initialise min/max frequencies before updating RPS registers
  drm/i915: Delay the relase of the forcewake by a jiffie
  drm/i915: Add a tracepoint for using a semaphore
  drm/i915: Boost RPS frequency for CPU stalls
  drm/i915: Tweak RPS thresholds to more aggressively downclock

Daniel Vetter (1):
  drm/i915: check that the i965g/gm 4G limit is really obeyed

Rodrigo Vivi (2):
  drm/i915: Allow GT3 Slice Shutdown on Boot.
  drm/i915: Allow Dynamically GT3 Slice Shutdown.

 drivers/gpu/drm/i915/i915_debugfs.c        |   2 +-
 drivers/gpu/drm/i915/i915_dma.c            |  41 ++---
 drivers/gpu/drm/i915/i915_drv.c            |   5 +
 drivers/gpu/drm/i915/i915_drv.h            |  28 +++-
 drivers/gpu/drm/i915/i915_gem.c            | 146 +++++++++++------
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
 drivers/gpu/drm/i915/i915_irq.c            |  59 ++++---
 drivers/gpu/drm/i915/i915_reg.h            |  13 +-
 drivers/gpu/drm/i915/i915_sysfs.c          |  64 +++++++-
 drivers/gpu/drm/i915/i915_trace.h          |  52 ++++--
 drivers/gpu/drm/i915/intel_display.c       |  16 +-
 drivers/gpu/drm/i915/intel_drv.h           |   7 +-
 drivers/gpu/drm/i915/intel_pm.c            | 250 +++++++++++++++++++++--------
 drivers/gpu/drm/i915/intel_uncore.c        |  30 +++-
 14 files changed, 525 insertions(+), 190 deletions(-)

--
1.8.1.4

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 01/13] drm/i915: check that the i965g/gm 4G limit is really obeyed
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 02/13] drm/i915: Move the conditional seqno query into the tracepoint Rodrigo Vivi
                   ` (12 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter

From: Daniel Vetter <daniel.vetter@ffwll.ch>

In truly crazy circumstances shmem might give us the wrong type of
page. So be a bit paranoid and double check this.

Reviewer: Damien Lespiau <damien.lespiau@intel.com>
Cc: Rob Clark <robdclark@gmail.com>
References: http://lkml.org/lkml/2011/7/11/238
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c12f44a..d68cc5c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1857,6 +1857,9 @@ i915_gem_object_get_pages_gtt(struct drm_i915_gem_object *obj)
 			sg->length += PAGE_SIZE;
 		}
 		last_pfn = page_to_pfn(page);
+
+		/* Check that the i965g/gm workaround works. */
+		WARN_ON((gfp & __GFP_DMA32) && (last_pfn >= 0x00100000UL));
 	}
 #ifdef CONFIG_SWIOTLB
 	if (!swiotlb_nr_tbl())
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 02/13] drm/i915: Move the conditional seqno query into the tracepoint
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 01/13] drm/i915: check that the i965g/gm 4G limit is really obeyed Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-24 18:50   ` Ville Syrjälä
  2013-09-23 20:33 ` [PATCH 03/13] drm/i915: Add some missing steps to i915_driver_load error path Rodrigo Vivi
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We only wish to know the value of seqno when emitting the tracepoint, so
move the query from a parameter to the macro to inside the conditional
macro body so that the query is only evaluated when required.

Reviewer: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_irq.c   |  2 +-
 drivers/gpu/drm/i915/i915_trace.h | 21 ++++++++++++++++++---
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index b356dc1..84b7efc 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -807,7 +807,7 @@ static void notify_ring(struct drm_device *dev,
 	if (ring->obj == NULL)
 		return;
 
-	trace_i915_gem_request_complete(ring, ring->get_seqno(ring, false));
+	trace_i915_gem_request_complete(ring);
 
 	wake_up_all(&ring->irq_queue);
 	i915_queue_hangcheck(dev);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index e2c5ee6..a1797f6 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -304,9 +304,24 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 	    TP_ARGS(ring, seqno)
 );
 
-DEFINE_EVENT(i915_gem_request, i915_gem_request_complete,
-	    TP_PROTO(struct intel_ring_buffer *ring, u32 seqno),
-	    TP_ARGS(ring, seqno)
+TRACE_EVENT(i915_gem_request_complete,
+	    TP_PROTO(struct intel_ring_buffer *ring),
+	    TP_ARGS(ring),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->dev = ring->dev->primary->index;
+			   __entry->ring = ring->id;
+			   __entry->seqno = ring->get_seqno(ring, false);
+			   ),
+
+	    TP_printk("dev=%u, ring=%u, seqno=%u",
+		      __entry->dev, __entry->ring, __entry->seqno)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.8.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 03/13] drm/i915: Add some missing steps to i915_driver_load error path
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 01/13] drm/i915: check that the i965g/gm 4G limit is really obeyed Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 02/13] drm/i915: Move the conditional seqno query into the tracepoint Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-10-02 17:00   ` Daniel Vetter
  2013-09-23 20:33 ` [PATCH 04/13] drm/i915: Asynchronously perform the set-base for a simple modeset Rodrigo Vivi
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

We missed adding a few cleanup steps for recent additions.

Reviewer:  Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_dma.c | 19 +++++++++++++++----
 1 file changed, 15 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 553586e..32e8368 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1554,7 +1554,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 
 	ret = i915_gem_gtt_init(dev);
 	if (ret)
-		goto put_bridge;
+		goto out_regs;
 
 	if (drm_core_check_feature(dev, DRIVER_MODESET))
 		i915_kick_out_firmware_fb(dev_priv);
@@ -1583,7 +1583,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 				     aperture_size);
 	if (dev_priv->gtt.mappable == NULL) {
 		ret = -EIO;
-		goto out_rmmap;
+		goto out_gtt;
 	}
 
 	dev_priv->gtt.mtrr = arch_phys_wc_add(dev_priv->gtt.mappable_base,
@@ -1657,7 +1657,7 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 		ret = i915_load_modeset_init(dev);
 		if (ret < 0) {
 			DRM_ERROR("failed to init modeset\n");
-			goto out_gem_unload;
+			goto out_power_well;
 		}
 	} else {
 		/* Start out suspended in ums mode. */
@@ -1677,6 +1677,10 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
 
 	return 0;
 
+out_power_well:
+	if (HAS_POWER_WELL(dev))
+		i915_remove_power_well(dev);
+	drm_vblank_cleanup(dev);
 out_gem_unload:
 	if (dev_priv->mm.inactive_shrinker.shrink)
 		unregister_shrinker(&dev_priv->mm.inactive_shrinker);
@@ -1690,12 +1694,17 @@ out_gem_unload:
 out_mtrrfree:
 	arch_phys_wc_del(dev_priv->gtt.mtrr);
 	io_mapping_free(dev_priv->gtt.mappable);
+out_gtt:
+	list_del(&dev_priv->gtt.base.global_link);
+	drm_mm_takedown(&dev_priv->gtt.base.mm);
 	dev_priv->gtt.base.cleanup(&dev_priv->gtt.base);
-out_rmmap:
+out_regs:
 	pci_iounmap(dev->pdev, dev_priv->regs);
 put_bridge:
 	pci_dev_put(dev_priv->bridge_dev);
 free_priv:
+	if (dev_priv->slab)
+		kmem_cache_destroy(dev_priv->slab);
 	kfree(dev_priv);
 	return ret;
 }
@@ -1788,6 +1797,8 @@ int i915_driver_unload(struct drm_device *dev)
 	if (dev_priv->regs != NULL)
 		pci_iounmap(dev->pdev, dev_priv->regs);
 
+	drm_vblank_cleanup(dev);
+
 	intel_teardown_gmbus(dev);
 	intel_teardown_mchbar(dev);
 
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 04/13] drm/i915: Asynchronously perform the set-base for a simple modeset
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (2 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 03/13] drm/i915: Add some missing steps to i915_driver_load error path Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 05/13] drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT Rodrigo Vivi
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

A simple modeset, where we only wish to switch over to a new framebuffer
such as the transition from fbcon to X, takes around 30-60ms. This is
due to three factors:

1. We need to make sure the fb->obj is in the display domain, which
incurs a cache flush to ensure no dirt is left on the scanout.

2. We need to flush any pending rendering before performing the mmio
so that the frame is complete before it is shown.

3. We currently wait for the vblank after the mmio to be sure that the
old fb is no longer being shown before releasing it.

(1) can only be eliminated by userspace preparing the fb->obj in advance
to already be in the display domain. This can be done through use of the
create2 ioctl, or by reusing an existing fb->obj.

However, (2) and (3) are already solved by the existing page flip
mechanism, and it is surprisingly trivial to wire them up for use in the
set-base fast path. Though it can be argued that this represents a
subtle ABI break in that the set_config ioctl now returns before the old
framebuffer is unpinned. The danger is that userspace will start to
modify it before it is no longer being shown, however we should be able
to prevent that through proper domain tracking.

By combining all of the above, we can achieve an instaneous set_config:

[     6.601] (II) intel(0): switch to mode 2560x1440@60.0 on pipe 0 using DP2, position (0, 0), rotation normal
[     6.601] (II) intel(0): Setting screen physical size to 677 x 381

v2 (by Vivi): page_flip_flag was added to intel_crtc_page_flip
              in a previous commit. using 0.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 76d1d32..bcf0960 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9445,10 +9445,13 @@ static int intel_crtc_set_config(struct drm_mode_set *set)
 		ret = intel_set_mode(set->crtc, set->mode,
 				     set->x, set->y, set->fb);
 	} else if (config->fb_changed) {
-		intel_crtc_wait_for_pending_flips(set->crtc);
-
-		ret = intel_pipe_set_base(set->crtc,
-					  set->x, set->y, set->fb);
+		if (to_intel_framebuffer(set->fb)->obj->ring == NULL ||
+		    save_set.x != set->x || save_set.y != set->y ||
+		    intel_crtc_page_flip(set->crtc, set->fb, NULL, 0)) {
+			intel_crtc_wait_for_pending_flips(set->crtc);
+			ret = intel_pipe_set_base(set->crtc,
+						  set->x, set->y, set->fb);
+		}
 	}
 
 	if (ret) {
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 05/13] drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (3 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 04/13] drm/i915: Asynchronously perform the set-base for a simple modeset Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 06/13] drm/i915: Pair seqno completion tracepoint with its dispatch Rodrigo Vivi
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

For unfathomable reasons this alignment appears to be required for tiled
scanouts being read from stolen memory. I can find no reference in the
w/a db to support this requirement, but the evidence of my own eyes says
this prevents many headaches.

Note that I have not tricked anything older than Sandybridge into using
stolen tiled scanouts, so the extra alignment may be required there as
well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/intel_display.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index bcf0960..105ec54 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -1879,6 +1879,8 @@ intel_pin_and_fence_fb_obj(struct drm_device *dev,
 	case I915_TILING_X:
 		/* pin() will align the object as required by fence */
 		alignment = 0;
+		if (obj->stolen && INTEL_INFO(dev)->gen >= 6)
+			alignment = 256 * 1024;
 		break;
 	case I915_TILING_Y:
 		/* Despite that we check this in framebuffer_init userspace can
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 06/13] drm/i915: Pair seqno completion tracepoint with its dispatch
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (4 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 05/13] drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 07/13] drm/i915: Initialise min/max frequencies before updating RPS registers Rodrigo Vivi
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

In order to time how long a seqno is executed by a ring, we need to
measure both its insertion and its completion. (Using the completion of
the previous seqno as an estimate for when the GPU starts, if busy.) In
order to get an exact completion timestamp, we need irqs. This is
enabled by trace_i915_gem_ring_dispatch, so it makes more sens to pair
the completion event with that rather than the request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  2 +-
 drivers/gpu/drm/i915/i915_irq.c            |  2 +-
 drivers/gpu/drm/i915/i915_trace.h          | 48 +++++++++++++++---------------
 3 files changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ee93357..3976579 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1185,7 +1185,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 			goto err;
 	}
 
-	trace_i915_gem_ring_dispatch(ring, intel_ring_get_seqno(ring), flags);
+	trace_i915_gem_ring_dispatch(ring, flags);
 
 	i915_gem_execbuffer_move_to_active(&eb->vmas, ring);
 	i915_gem_execbuffer_retire_commands(dev, file, ring, batch_obj);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 84b7efc..82bd7f5d 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -807,7 +807,7 @@ static void notify_ring(struct drm_device *dev,
 	if (ring->obj == NULL)
 		return;
 
-	trace_i915_gem_request_complete(ring);
+	trace_i915_gem_ring_complete(ring);
 
 	wake_up_all(&ring->irq_queue);
 	i915_queue_hangcheck(dev);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index a1797f6..5c8e36a 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -234,8 +234,8 @@ TRACE_EVENT(i915_gem_evict_everything,
 );
 
 TRACE_EVENT(i915_gem_ring_dispatch,
-	    TP_PROTO(struct intel_ring_buffer *ring, u32 seqno, u32 flags),
-	    TP_ARGS(ring, seqno, flags),
+	    TP_PROTO(struct intel_ring_buffer *ring, u32 flags),
+	    TP_ARGS(ring, flags),
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
@@ -247,15 +247,35 @@ TRACE_EVENT(i915_gem_ring_dispatch,
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
-			   __entry->seqno = seqno;
+			   __entry->seqno = intel_ring_get_seqno(ring),
 			   __entry->flags = flags;
-			   i915_trace_irq_get(ring, seqno);
+			   i915_trace_irq_get(ring, __entry->seqno);
 			   ),
 
 	    TP_printk("dev=%u, ring=%u, seqno=%u, flags=%x",
 		      __entry->dev, __entry->ring, __entry->seqno, __entry->flags)
 );
 
+TRACE_EVENT(i915_gem_ring_complete,
+	    TP_PROTO(struct intel_ring_buffer *ring),
+	    TP_ARGS(ring),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(u32, ring)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->dev = ring->dev->primary->index;
+			   __entry->ring = ring->id;
+			   __entry->seqno = ring->get_seqno(ring, false);
+			   ),
+
+	    TP_printk("dev=%u, ring=%u, seqno=%u",
+		      __entry->dev, __entry->ring, __entry->seqno)
+);
+
 TRACE_EVENT(i915_gem_ring_flush,
 	    TP_PROTO(struct intel_ring_buffer *ring, u32 invalidate, u32 flush),
 	    TP_ARGS(ring, invalidate, flush),
@@ -304,26 +324,6 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 	    TP_ARGS(ring, seqno)
 );
 
-TRACE_EVENT(i915_gem_request_complete,
-	    TP_PROTO(struct intel_ring_buffer *ring),
-	    TP_ARGS(ring),
-
-	    TP_STRUCT__entry(
-			     __field(u32, dev)
-			     __field(u32, ring)
-			     __field(u32, seqno)
-			     ),
-
-	    TP_fast_assign(
-			   __entry->dev = ring->dev->primary->index;
-			   __entry->ring = ring->id;
-			   __entry->seqno = ring->get_seqno(ring, false);
-			   ),
-
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
-);
-
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
 	    TP_PROTO(struct intel_ring_buffer *ring, u32 seqno),
 	    TP_ARGS(ring, seqno)
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 07/13] drm/i915: Initialise min/max frequencies before updating RPS registers
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (5 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 06/13] drm/i915: Pair seqno completion tracepoint with its dispatch Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 08/13] drm/i915: Delay the relase of the forcewake by a jiffie Rodrigo Vivi
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

The RPS register writing routines use the current value of min/max to
set certain limits and interrupt gating. If we set those afterwards, we
risk setting up the hw incorrectly and losing power management events,
and worse, trigger some internal assertions.

Reorder the calling sequences to be correct, and remove the then
unrequired clamping from inside set_rps(). And for a bonus, fix the bug
of calling gen6_set_rps() from Valleyview.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c |  2 +-
 drivers/gpu/drm/i915/i915_sysfs.c   | 16 ++++++++--------
 drivers/gpu/drm/i915/intel_pm.c     | 19 +++++--------------
 3 files changed, 14 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 80bed69..c03a2e1 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2019,7 +2019,7 @@ i915_max_freq_set(void *data, u64 val)
 	if (IS_VALLEYVIEW(dev)) {
 		val = vlv_freq_opcode(dev_priv->mem_freq, val);
 		dev_priv->rps.max_delay = val;
-		gen6_set_rps(dev, val);
+		valleyview_set_rps(dev, val);
 	} else {
 		do_div(val, GT_FREQUENCY_MULTIPLIER);
 		dev_priv->rps.max_delay = val;
diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 44f4c1a..7659dd4 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -316,15 +316,15 @@ static ssize_t gt_max_freq_mhz_store(struct device *kdev,
 		DRM_DEBUG("User requested overclocking to %d\n",
 			  val * GT_FREQUENCY_MULTIPLIER);
 
+	dev_priv->rps.max_delay = val;
+
 	if (dev_priv->rps.cur_delay > val) {
-		if (IS_VALLEYVIEW(dev_priv->dev))
-			valleyview_set_rps(dev_priv->dev, val);
+		if (IS_VALLEYVIEW(dev))
+			valleyview_set_rps(dev, val);
 		else
-			gen6_set_rps(dev_priv->dev, val);
+			gen6_set_rps(dev, val);
 	}
 
-	dev_priv->rps.max_delay = val;
-
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
 	return count;
@@ -381,15 +381,15 @@ static ssize_t gt_min_freq_mhz_store(struct device *kdev,
 		return -EINVAL;
 	}
 
+	dev_priv->rps.min_delay = val;
+
 	if (dev_priv->rps.cur_delay < val) {
 		if (IS_VALLEYVIEW(dev))
 			valleyview_set_rps(dev, val);
 		else
-			gen6_set_rps(dev_priv->dev, val);
+			gen6_set_rps(dev, val);
 	}
 
-	dev_priv->rps.min_delay = val;
-
 	mutex_unlock(&dev_priv->rps.hw_lock);
 
 	return count;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 4692f8c..d5b8c78 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3278,26 +3278,19 @@ static void ironlake_disable_drps(struct drm_device *dev)
  * ourselves, instead of doing a rmw cycle (which might result in us clearing
  * all limits and the gpu stuck at whatever frequency it is at atm).
  */
-static u32 gen6_rps_limits(struct drm_i915_private *dev_priv, u8 *val)
+static u32 gen6_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 {
 	u32 limits;
 
-	limits = 0;
-
-	if (*val >= dev_priv->rps.max_delay)
-		*val = dev_priv->rps.max_delay;
-	limits |= dev_priv->rps.max_delay << 24;
-
 	/* Only set the down limit when we've reached the lowest level to avoid
 	 * getting more interrupts, otherwise leave this clear. This prevents a
 	 * race in the hw when coming out of rc6: There's a tiny window where
 	 * the hw runs at the minimal clock before selecting the desired
 	 * frequency, if the down threshold expires in that window we will not
 	 * receive a down interrupt. */
-	if (*val <= dev_priv->rps.min_delay) {
-		*val = dev_priv->rps.min_delay;
+	limits = dev_priv->rps.max_delay << 24;
+	if (val <= dev_priv->rps.min_delay)
 		limits |= dev_priv->rps.min_delay << 16;
-	}
 
 	return limits;
 }
@@ -3305,7 +3298,6 @@ static u32 gen6_rps_limits(struct drm_i915_private *dev_priv, u8 *val)
 void gen6_set_rps(struct drm_device *dev, u8 val)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	u32 limits = gen6_rps_limits(dev_priv, &val);
 
 	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 	WARN_ON(val > dev_priv->rps.max_delay);
@@ -3326,7 +3318,8 @@ void gen6_set_rps(struct drm_device *dev, u8 val)
 	/* Make sure we continue to get interrupts
 	 * until we hit the minimum or maximum frequencies.
 	 */
-	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS, limits);
+	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
+		   gen6_rps_limits(dev_priv, val));
 
 	POSTING_READ(GEN6_RPNSWREQ);
 
@@ -3364,8 +3357,6 @@ void valleyview_set_rps(struct drm_device *dev, u8 val)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
-	gen6_rps_limits(dev_priv, &val);
-
 	WARN_ON(!mutex_is_locked(&dev_priv->rps.hw_lock));
 	WARN_ON(val > dev_priv->rps.max_delay);
 	WARN_ON(val < dev_priv->rps.min_delay);
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 08/13] drm/i915: Delay the relase of the forcewake by a jiffie
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (6 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 07/13] drm/i915: Initialise min/max frequencies before updating RPS registers Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 09/13] drm/i915: Add a tracepoint for using a semaphore Rodrigo Vivi
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

Obtaining the forcwake requires expensive and time consuming
serialisation. And we often try to obtain the forcewake multiple times
in very quick succession. We can reduce the overhead of these sequences
by delaying the forcewake release, and so not hammer the hw quite so
hard.

I was hoping this would help with the spurious
[drm:__gen6_gt_force_wake_mt_get] *ERROR* Timed out waiting for forcewake old ack to clear.
found on Haswell. Alas not.

v2: Fix teardown ordering - unmap the regs after turning off forcewake,
and make sure we do turn off forcewake - both found by Ville.

Note: I have no claims for improved performance, stablity or power
comsumption for this patch. We should not be hitting the registers often
enough for this to improve benchmarks, but given the nature of our hw it
is likely to improve long term stability.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_dma.c     |  6 ++++--
 drivers/gpu/drm/i915/i915_drv.h     |  3 +++
 drivers/gpu/drm/i915/intel_uncore.c | 30 ++++++++++++++++++++++++++++--
 3 files changed, 35 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 32e8368..a09a9c7 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1794,8 +1794,6 @@ int i915_driver_unload(struct drm_device *dev)
 	list_del(&dev_priv->gtt.base.global_link);
 	WARN_ON(!list_empty(&dev_priv->vm_list));
 	drm_mm_takedown(&dev_priv->gtt.base.mm);
-	if (dev_priv->regs != NULL)
-		pci_iounmap(dev->pdev, dev_priv->regs);
 
 	drm_vblank_cleanup(dev);
 
@@ -1807,6 +1805,10 @@ int i915_driver_unload(struct drm_device *dev)
 
 	dev_priv->gtt.base.cleanup(&dev_priv->gtt.base);
 
+	intel_uncore_fini(dev);
+	if (dev_priv->regs != NULL)
+		pci_iounmap(dev->pdev, dev_priv->regs);
+
 	if (dev_priv->slab)
 		kmem_cache_destroy(dev_priv->slab);
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8c52cbd..c171285 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -408,6 +408,8 @@ struct intel_uncore {
 
 	unsigned fifo_count;
 	unsigned forcewake_count;
+
+	struct delayed_work force_wake_work;
 };
 
 #define DEV_INFO_FOR_EACH_FLAG(func, sep) \
@@ -1791,6 +1793,7 @@ extern void intel_uncore_early_sanitize(struct drm_device *dev);
 extern void intel_uncore_init(struct drm_device *dev);
 extern void intel_uncore_clear_errors(struct drm_device *dev);
 extern void intel_uncore_check_errors(struct drm_device *dev);
+extern void intel_uncore_fini(struct drm_device *dev);
 
 void
 i915_enable_pipestat(drm_i915_private_t *dev_priv, int pipe, u32 mask);
diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
index 8649f1c..462cc7f 100644
--- a/drivers/gpu/drm/i915/intel_uncore.c
+++ b/drivers/gpu/drm/i915/intel_uncore.c
@@ -204,6 +204,18 @@ static void vlv_force_wake_put(struct drm_i915_private *dev_priv)
 	gen6_gt_check_fifodbg(dev_priv);
 }
 
+static void gen6_force_wake_work(struct work_struct *work)
+{
+	struct drm_i915_private *dev_priv =
+		container_of(work, typeof(*dev_priv), uncore.force_wake_work.work);
+	unsigned long irqflags;
+
+	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
+	if (--dev_priv->uncore.forcewake_count == 0)
+		dev_priv->uncore.funcs.force_wake_put(dev_priv);
+	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
+}
+
 void intel_uncore_early_sanitize(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -216,6 +228,9 @@ void intel_uncore_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 
+	INIT_DELAYED_WORK(&dev_priv->uncore.force_wake_work,
+			  gen6_force_wake_work);
+
 	if (IS_VALLEYVIEW(dev)) {
 		dev_priv->uncore.funcs.force_wake_get = vlv_force_wake_get;
 		dev_priv->uncore.funcs.force_wake_put = vlv_force_wake_put;
@@ -261,6 +276,13 @@ void intel_uncore_init(struct drm_device *dev)
 	}
 }
 
+void intel_uncore_fini(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	flush_delayed_work(&dev_priv->uncore.force_wake_work);
+}
+
 static void intel_uncore_forcewake_reset(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -306,8 +328,12 @@ void gen6_gt_force_wake_put(struct drm_i915_private *dev_priv)
 	unsigned long irqflags;
 
 	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
-	if (--dev_priv->uncore.forcewake_count == 0)
-		dev_priv->uncore.funcs.force_wake_put(dev_priv);
+	if (--dev_priv->uncore.forcewake_count == 0) {
+		dev_priv->uncore.forcewake_count++;
+		mod_delayed_work(dev_priv->wq,
+				 &dev_priv->uncore.force_wake_work,
+				 1);
+	}
 	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
 }
 
-- 
1.8.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 09/13] drm/i915: Add a tracepoint for using a semaphore
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (7 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 08/13] drm/i915: Delay the relase of the forcewake by a jiffie Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-25  9:34   ` Ville Syrjälä
  2013-09-23 20:33 ` [PATCH 10/13] drm/i915: Boost RPS frequency for CPU stalls Rodrigo Vivi
                   ` (4 subsequent siblings)
  13 siblings, 1 reply; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

So that we can find the callers who introduce a ring stall. A single
ring stall is not too unwelcome, the right issue becomes when they start
to interlock and prevent any concurrent work. That, however, is a little
tricker to detect with a mere tracepoint!

v2: Rebrand it as a ring event, rather than an object event.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |  2 ++
 drivers/gpu/drm/i915/i915_trace.h | 19 +++++++++++++++++++
 2 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d68cc5c..4a16491 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2614,6 +2614,8 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
+	trace_i915_gem_ring_sync_to(from, to);
+
 	ret = to->sync_to(to, from, seqno);
 	if (!ret)
 		/* We use last_read_seqno because sync_to()
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 5c8e36a..48e8f07 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -233,6 +233,25 @@ TRACE_EVENT(i915_gem_evict_everything,
 	    TP_printk("dev=%d", __entry->dev)
 );
 
+TRACE_EVENT(i915_gem_ring_sync_to,
+	    TP_PROTO(struct intel_ring_buffer *from, struct intel_ring_buffer *to),
+	    TP_ARGS(from, to),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(u32, sync_from)
+			     __field(u32, sync_to)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->dev = from->dev->primary->index;
+			   __entry->sync_from = from->id;
+			   __entry->sync_to = to->id;
+			   ),
+
+	    TP_printk("dev=%u, sync-from=%u, sync-to=%u", __entry->dev, __entry->sync_from, __entry->sync_to)
+);
+
 TRACE_EVENT(i915_gem_ring_dispatch,
 	    TP_PROTO(struct intel_ring_buffer *ring, u32 flags),
 	    TP_ARGS(ring, flags),
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 10/13] drm/i915: Boost RPS frequency for CPU stalls
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (8 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 09/13] drm/i915: Add a tracepoint for using a semaphore Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 11/13] drm/i915: Tweak RPS thresholds to more aggressively downclock Rodrigo Vivi
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Owen Taylor, Stéphane Marchesin, Zhuang, Lena

From: Chris Wilson <chris@chris-wilson.co.uk>

If we encounter a situation where the CPU blocks waiting for results
from the GPU, give the GPU a kick to boost its the frequency.

This should work to reduce user interface stalls and to quickly promote
mesa to high frequencies - but the cost is that our requested frequency
stalls high (as we do not idle for long enough before rc6 to start
reducing frequencies, nor are we aggressive at down clocking an
underused GPU). However, this should be mitigated by rc6 itself powering
off the GPU when idle, and that energy use is dependent upon the workload
of the GPU in addition to its frequency (e.g. the math or sampler
functions only consume power when used). Still, this is likely to
adversely affect light workloads.

In particular, this nearly eliminates the highly noticeable wake-up lag
in animations from idle. For example, expose or workspace transitions.
(However, given the situation where we fail to downclock, our requested
frequency is almost always the maximum, except for Baytrail where we
manually downclock upon idling. This often masks the latency of
upclocking after being idle, so animations are typically smooth - at the
cost of increased power consumption.)

Stéphane raised the concern that this will punish good applications and
reward bad applications - but due to the nature of how mesa performs its
client throttling, I believe all mesa applications will be roughly
equally affected. To address this concern, and to prevent applications
like compositors from regularly boosting the RPS state, we only allow
each client to receive one boost in each period of activity.

Unfortunately, this techinique is ineffective with Ironlake - which also
has dynamic render power states and suffers just as dramatically. For
Ironlake, the thermal/power headroom is shared with the CPU through
Intelligent Power Sharing and the intel-ips module. This leaves us with
no GPU boost frequencies available when coming out of idle, and due to
hardware limitations we cannot change the arbitration between the CPU and
GPU quickly enough to be effective.

v2: Limit each client to receiving a single boost for each active period.
v3: Tidy up. Allow the device to always boost if it waits outside of
client context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_dma.c      |  16 ++---
 drivers/gpu/drm/i915/i915_drv.h      |  19 +++--
 drivers/gpu/drm/i915/i915_gem.c      | 136 ++++++++++++++++++++++++-----------
 drivers/gpu/drm/i915/i915_irq.c      |  11 ---
 drivers/gpu/drm/i915/intel_display.c |   3 +
 drivers/gpu/drm/i915/intel_drv.h     |   3 +
 drivers/gpu/drm/i915/intel_pm.c      |  42 ++++++-----
 7 files changed, 139 insertions(+), 91 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index a09a9c7..db4efef 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1820,19 +1820,11 @@ int i915_driver_unload(struct drm_device *dev)
 
 int i915_driver_open(struct drm_device *dev, struct drm_file *file)
 {
-	struct drm_i915_file_private *file_priv;
-
-	DRM_DEBUG_DRIVER("\n");
-	file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
-	if (!file_priv)
-		return -ENOMEM;
-
-	file->driver_priv = file_priv;
-
-	spin_lock_init(&file_priv->mm.lock);
-	INIT_LIST_HEAD(&file_priv->mm.request_list);
+	int ret;
 
-	idr_init(&file_priv->context_idr);
+	ret = i915_gem_open(dev, file);
+	if (ret)
+		return ret;
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c171285..82658f8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -844,9 +844,6 @@ struct intel_gen6_power_mgmt {
 	struct work_struct work;
 	u32 pm_iir;
 
-	/* On vlv we need to manually drop to Vmin with a delayed work. */
-	struct delayed_work vlv_work;
-
 	/* The below variables an all the rps hw state are protected by
 	 * dev->struct mutext. */
 	u8 cur_delay;
@@ -965,6 +962,15 @@ struct i915_gem_mm {
 	struct delayed_work retire_work;
 
 	/**
+	 * When we detect an idle GPU, we want to turn on
+	 * powersaving features. So once we see that there
+	 * are no more requests outstanding and no more
+	 * arrive within a small period of time, we fire
+	 * off the idle_work.
+	 */
+	struct delayed_work idle_work;
+
+	/**
 	 * Are we in a non-interruptible section of code like
 	 * modesetting?
 	 */
@@ -1581,13 +1587,17 @@ struct drm_i915_gem_request {
 };
 
 struct drm_i915_file_private {
+	struct drm_i915_private *dev_priv;
+
 	struct {
 		spinlock_t lock;
 		struct list_head request_list;
+		struct delayed_work idle_work;
 	} mm;
 	struct idr context_idr;
 
 	struct i915_ctx_hang_stats hang_stats;
+	atomic_t rps_wait_boost;
 };
 
 #define INTEL_INFO(dev)	(to_i915(dev)->info)
@@ -1938,7 +1948,7 @@ i915_gem_object_unpin_fence(struct drm_i915_gem_object *obj)
 	}
 }
 
-void i915_gem_retire_requests(struct drm_device *dev);
+bool i915_gem_retire_requests(struct drm_device *dev);
 void i915_gem_retire_requests_ring(struct intel_ring_buffer *ring);
 int __must_check i915_gem_check_wedge(struct i915_gpu_error *error,
 				      bool interruptible);
@@ -1989,6 +1999,7 @@ int i915_gem_attach_phys_object(struct drm_device *dev,
 void i915_gem_detach_phys_object(struct drm_device *dev,
 				 struct drm_i915_gem_object *obj);
 void i915_gem_free_all_phys_object(struct drm_device *dev);
+int i915_gem_open(struct drm_device *dev, struct drm_file *file);
 void i915_gem_release(struct drm_device *dev, struct drm_file *file);
 
 uint32_t
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4a16491..fe7803f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -969,6 +969,14 @@ i915_gem_check_olr(struct intel_ring_buffer *ring, u32 seqno)
 	return ret;
 }
 
+static bool can_wait_boost(struct drm_i915_file_private *file_priv)
+{
+	if (file_priv == NULL)
+		return true;
+
+	return !atomic_xchg(&file_priv->rps_wait_boost, true);
+}
+
 /**
  * __wait_seqno - wait until execution of seqno has finished
  * @ring: the ring expected to report seqno
@@ -989,7 +997,9 @@ i915_gem_check_olr(struct intel_ring_buffer *ring, u32 seqno)
  */
 static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno,
 			unsigned reset_counter,
-			bool interruptible, struct timespec *timeout)
+			bool interruptible,
+			struct timespec *timeout,
+			struct drm_i915_file_private *file_priv)
 {
 	drm_i915_private_t *dev_priv = ring->dev->dev_private;
 	struct timespec before, now, wait_time={1,0};
@@ -1012,6 +1022,9 @@ static int __wait_seqno(struct intel_ring_buffer *ring, u32 seqno,
 
 	timeout_jiffies = timespec_to_jiffies_timeout(&wait_time);
 
+	if (dev_priv->info->gen >= 6 && can_wait_boost(file_priv))
+		gen6_rps_boost(dev_priv);
+
 	if (WARN_ON(!ring->irq_get(ring)))
 		return -ENODEV;
 
@@ -1094,7 +1107,7 @@ i915_wait_seqno(struct intel_ring_buffer *ring, uint32_t seqno)
 
 	return __wait_seqno(ring, seqno,
 			    atomic_read(&dev_priv->gpu_error.reset_counter),
-			    interruptible, NULL);
+			    interruptible, NULL, NULL);
 }
 
 static int
@@ -1144,6 +1157,7 @@ i915_gem_object_wait_rendering(struct drm_i915_gem_object *obj,
  */
 static __must_check int
 i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
+					    struct drm_file *file,
 					    bool readonly)
 {
 	struct drm_device *dev = obj->base.dev;
@@ -1170,7 +1184,7 @@ i915_gem_object_wait_rendering__nonblocking(struct drm_i915_gem_object *obj,
 
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
-	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, file->driver_priv);
 	mutex_lock(&dev->struct_mutex);
 	if (ret)
 		return ret;
@@ -1219,7 +1233,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 	 * We will repeat the flush holding the lock in the normal manner
 	 * to catch cases where we are gazumped.
 	 */
-	ret = i915_gem_object_wait_rendering__nonblocking(obj, !write_domain);
+	ret = i915_gem_object_wait_rendering__nonblocking(obj, file, !write_domain);
 	if (ret)
 		goto unref;
 
@@ -2116,6 +2130,8 @@ int __i915_add_request(struct intel_ring_buffer *ring,
 	if (file) {
 		struct drm_i915_file_private *file_priv = file->driver_priv;
 
+		cancel_delayed_work_sync(&file_priv->mm.idle_work);
+
 		spin_lock(&file_priv->mm.lock);
 		request->file_priv = file_priv;
 		list_add_tail(&request->client_list,
@@ -2131,6 +2147,7 @@ int __i915_add_request(struct intel_ring_buffer *ring,
 		i915_queue_hangcheck(ring->dev);
 
 		if (was_empty) {
+			cancel_delayed_work_sync(&dev_priv->mm.idle_work);
 			queue_delayed_work(dev_priv->wq,
 					   &dev_priv->mm.retire_work,
 					   round_jiffies_up_relative(HZ));
@@ -2152,10 +2169,12 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 		return;
 
 	spin_lock(&file_priv->mm.lock);
-	if (request->file_priv) {
-		list_del(&request->client_list);
-		request->file_priv = NULL;
-	}
+	list_del(&request->client_list);
+	if (list_empty(&file_priv->mm.request_list))
+		mod_delayed_work(to_i915(request->ring->dev)->wq,
+				 &file_priv->mm.idle_work,
+				 msecs_to_jiffies(100));
+	request->file_priv = NULL;
 	spin_unlock(&file_priv->mm.lock);
 }
 
@@ -2419,57 +2438,53 @@ i915_gem_retire_requests_ring(struct intel_ring_buffer *ring)
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
-void
+bool
 i915_gem_retire_requests(struct drm_device *dev)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
 	struct intel_ring_buffer *ring;
+	bool idle = true;
 	int i;
 
-	for_each_ring(ring, dev_priv, i)
+	for_each_ring(ring, dev_priv, i) {
 		i915_gem_retire_requests_ring(ring);
+		idle &= list_empty(&ring->request_list);
+	}
+
+	if (idle)
+		mod_delayed_work(dev_priv->wq,
+				   &dev_priv->mm.idle_work,
+				   msecs_to_jiffies(100));
+
+	return idle;
 }
 
 static void
 i915_gem_retire_work_handler(struct work_struct *work)
 {
-	drm_i915_private_t *dev_priv;
-	struct drm_device *dev;
-	struct intel_ring_buffer *ring;
+	struct drm_i915_private *dev_priv =
+		container_of(work, typeof(*dev_priv), mm.retire_work.work);
+	struct drm_device *dev = dev_priv->dev;
 	bool idle;
-	int i;
-
-	dev_priv = container_of(work, drm_i915_private_t,
-				mm.retire_work.work);
-	dev = dev_priv->dev;
 
 	/* Come back later if the device is busy... */
-	if (!mutex_trylock(&dev->struct_mutex)) {
-		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work,
-				   round_jiffies_up_relative(HZ));
-		return;
-	}
-
-	i915_gem_retire_requests(dev);
-
-	/* Send a periodic flush down the ring so we don't hold onto GEM
-	 * objects indefinitely.
-	 */
-	idle = true;
-	for_each_ring(ring, dev_priv, i) {
-		if (ring->gpu_caches_dirty)
-			i915_add_request(ring, NULL);
-
-		idle &= list_empty(&ring->request_list);
+	idle = false;
+	if (mutex_trylock(&dev->struct_mutex)) {
+		idle = i915_gem_retire_requests(dev);
+		mutex_unlock(&dev->struct_mutex);
 	}
-
-	if (!dev_priv->ums.mm_suspended && !idle)
+	if (!idle)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work,
 				   round_jiffies_up_relative(HZ));
-	if (idle)
-		intel_mark_idle(dev);
+}
 
-	mutex_unlock(&dev->struct_mutex);
+static void
+i915_gem_idle_work_handler(struct work_struct *work)
+{
+	struct drm_i915_private *dev_priv =
+		container_of(work, typeof(*dev_priv), mm.idle_work.work);
+
+	intel_mark_idle(dev_priv->dev);
 }
 
 /**
@@ -2567,7 +2582,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 	reset_counter = atomic_read(&dev_priv->gpu_error.reset_counter);
 	mutex_unlock(&dev->struct_mutex);
 
-	ret = __wait_seqno(ring, seqno, reset_counter, true, timeout);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, timeout, file->driver_priv);
 	if (timeout)
 		args->timeout_ns = timespec_to_ns(timeout);
 	return ret;
@@ -3780,7 +3795,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (seqno == 0)
 		return 0;
 
-	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL);
+	ret = __wait_seqno(ring, seqno, reset_counter, true, NULL, NULL);
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
@@ -4253,6 +4268,7 @@ i915_gem_idle(struct drm_device *dev)
 
 	/* Cancel the retire work handler, which should be idle now. */
 	cancel_delayed_work_sync(&dev_priv->mm.retire_work);
+	cancel_delayed_work_sync(&dev_priv->mm.idle_work);
 
 	return 0;
 }
@@ -4586,6 +4602,8 @@ i915_gem_load(struct drm_device *dev)
 		INIT_LIST_HEAD(&dev_priv->fence_regs[i].lru_list);
 	INIT_DELAYED_WORK(&dev_priv->mm.retire_work,
 			  i915_gem_retire_work_handler);
+	INIT_DELAYED_WORK(&dev_priv->mm.idle_work,
+			  i915_gem_idle_work_handler);
 	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
 
 	/* On GEN3 we really need to make sure the ARB C3 LP bit is set */
@@ -4809,6 +4827,8 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 
+	cancel_delayed_work_sync(&file_priv->mm.idle_work);
+
 	/* Clean up our request list when the client is going away, so that
 	 * later retire_requests won't dereference our soon-to-be-gone
 	 * file_priv.
@@ -4826,6 +4846,38 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 	spin_unlock(&file_priv->mm.lock);
 }
 
+static void
+i915_gem_file_idle_work_handler(struct work_struct *work)
+{
+	struct drm_i915_file_private *file_priv =
+		container_of(work, typeof(*file_priv), mm.idle_work.work);
+
+	atomic_set(&file_priv->rps_wait_boost, false);
+}
+
+int i915_gem_open(struct drm_device *dev, struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv;
+
+	DRM_DEBUG_DRIVER("\n");
+
+	file_priv = kzalloc(sizeof(*file_priv), GFP_KERNEL);
+	if (!file_priv)
+		return -ENOMEM;
+
+	file->driver_priv = file_priv;
+	file_priv->dev_priv = dev->dev_private;
+
+	spin_lock_init(&file_priv->mm.lock);
+	INIT_LIST_HEAD(&file_priv->mm.request_list);
+	INIT_DELAYED_WORK(&file_priv->mm.idle_work,
+			  i915_gem_file_idle_work_handler);
+
+	idr_init(&file_priv->context_idr);
+
+	return 0;
+}
+
 static bool mutex_is_locked_by(struct mutex *mutex, struct task_struct *task)
 {
 	if (!mutex_is_locked(mutex))
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 82bd7f5d..68c0b08 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -859,17 +859,6 @@ static void gen6_pm_rps_work(struct work_struct *work)
 			gen6_set_rps(dev_priv->dev, new_delay);
 	}
 
-	if (IS_VALLEYVIEW(dev_priv->dev)) {
-		/*
-		 * On VLV, when we enter RC6 we may not be at the minimum
-		 * voltage level, so arm a timer to check.  It should only
-		 * fire when there's activity or once after we've entered
-		 * RC6, and then won't be re-armed until the next RPS interrupt.
-		 */
-		mod_delayed_work(dev_priv->wq, &dev_priv->rps.vlv_work,
-				 msecs_to_jiffies(100));
-	}
-
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 105ec54..98f712c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -7647,6 +7647,9 @@ void intel_mark_idle(struct drm_device *dev)
 
 		intel_decrease_pllclock(crtc);
 	}
+
+	if (dev_priv->info->gen >= 6)
+		gen6_rps_idle(dev->dev_private);
 }
 
 void intel_mark_fb_busy(struct drm_i915_gem_object *obj,
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 4b75328..f2ef5cb 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -830,4 +830,7 @@ extern void ironlake_check_encoder_dotclock(const struct intel_crtc_config *pipe
 extern bool intel_crtc_active(struct drm_crtc *crtc);
 extern void i915_disable_vga_mem(struct drm_device *dev);
 
+extern void gen6_rps_idle(struct drm_i915_private *dev_priv);
+extern void gen6_rps_boost(struct drm_i915_private *dev_priv);
+
 #endif /* __INTEL_DRV_H__ */
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d5b8c78..d78a415 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3328,6 +3328,26 @@ void gen6_set_rps(struct drm_device *dev, u8 val)
 	trace_intel_gpu_freq_change(val * 50);
 }
 
+void gen6_rps_idle(struct drm_i915_private *dev_priv)
+{
+	mutex_lock(&dev_priv->rps.hw_lock);
+	if (dev_priv->info->is_valleyview)
+		valleyview_set_rps(dev_priv->dev, dev_priv->rps.min_delay);
+	else
+		gen6_set_rps(dev_priv->dev, dev_priv->rps.min_delay);
+	mutex_unlock(&dev_priv->rps.hw_lock);
+}
+
+void gen6_rps_boost(struct drm_i915_private *dev_priv)
+{
+	mutex_lock(&dev_priv->rps.hw_lock);
+	if (dev_priv->info->is_valleyview)
+		valleyview_set_rps(dev_priv->dev, dev_priv->rps.max_delay);
+	else
+		gen6_set_rps(dev_priv->dev, dev_priv->rps.max_delay);
+	mutex_unlock(&dev_priv->rps.hw_lock);
+}
+
 /*
  * Wait until the previous freq change has completed,
  * or the timeout elapsed, and then update our notion
@@ -3715,24 +3735,6 @@ int valleyview_rps_min_freq(struct drm_i915_private *dev_priv)
 	return vlv_punit_read(dev_priv, PUNIT_REG_GPU_LFM) & 0xff;
 }
 
-static void vlv_rps_timer_work(struct work_struct *work)
-{
-	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
-						    rps.vlv_work.work);
-
-	/*
-	 * Timer fired, we must be idle.  Drop to min voltage state.
-	 * Note: we use RPe here since it should match the
-	 * Vmin we were shooting for.  That should give us better
-	 * perf when we come back out of RC6 than if we used the
-	 * min freq available.
-	 */
-	mutex_lock(&dev_priv->rps.hw_lock);
-	if (dev_priv->rps.cur_delay > dev_priv->rps.rpe_delay)
-		valleyview_set_rps(dev_priv->dev, dev_priv->rps.rpe_delay);
-	mutex_unlock(&dev_priv->rps.hw_lock);
-}
-
 static void valleyview_setup_pctx(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3871,8 +3873,6 @@ static void valleyview_enable_rps(struct drm_device *dev)
 				      dev_priv->rps.rpe_delay),
 			 dev_priv->rps.rpe_delay);
 
-	INIT_DELAYED_WORK(&dev_priv->rps.vlv_work, vlv_rps_timer_work);
-
 	valleyview_set_rps(dev_priv->dev, dev_priv->rps.rpe_delay);
 
 	gen6_enable_rps_interrupts(dev);
@@ -4612,8 +4612,6 @@ void intel_disable_gt_powersave(struct drm_device *dev)
 	} else if (INTEL_INFO(dev)->gen >= 6) {
 		cancel_delayed_work_sync(&dev_priv->rps.delayed_resume_work);
 		cancel_work_sync(&dev_priv->rps.work);
-		if (IS_VALLEYVIEW(dev))
-			cancel_delayed_work_sync(&dev_priv->rps.vlv_work);
 		mutex_lock(&dev_priv->rps.hw_lock);
 		if (IS_VALLEYVIEW(dev))
 			valleyview_disable_rps(dev);
-- 
1.8.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 11/13] drm/i915: Tweak RPS thresholds to more aggressively downclock
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (9 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 10/13] drm/i915: Boost RPS frequency for CPU stalls Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 12/13] drm/i915: Allow GT3 Slice Shutdown on Boot Rodrigo Vivi
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Owen Taylor, Stéphane Marchesin, Zhuang, Lena

From: Chris Wilson <chris@chris-wilson.co.uk>

After applying wait-boost we often find ourselves stuck at higher clocks
than required. The current threshold value requires the GPU to be
continuously and completely idle for 313ms before it is dropped by one
bin. Conversely, we require the GPU to be busy for an average of 90% over
a 84ms period before we upclock. So the current thresholds almost never
downclock the GPU, and respond very slowly to sudden demands for more
power. It is easy to observe that we currently lock into the wrong bin
and both underperform in benchmarks and consume more power than optimal
(just by repeating the task and measuring the different results).

An alternative approach, as discussed in the bspec, is to use a
continuous threshold for upclocking, and an average value for downclocking.
This is good for quickly detecting and reacting to state changes within a
frame, however it fails with the common throttling method of waiting
upon the outstanding frame - at least it is difficult to choose a
threshold that works well at 15,000fps and at 60fps. So continue to use
average busy/idle loads to determine frequency change.

v2: Use 3 power zones to keep frequencies low in steady-state mostly
idle (e.g. scrolling, interactive 2D drawing), and frequencies high
for demanding games. In between those end-states, we use a
fast-reclocking algorithm to converge more quickly on the desired bin.

v3: Bug fixes - make sure we reset adj after switching power zones.

v4: Tune - drop the continuous busy thresholds as it prevents us from
choosing the right frequency for glxgears style swap benchmarks. Instead
the goal is to be able to find the right clocks irrespective of the
wait-boost.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Kenneth Graunke <kenneth@whitecape.org>
Cc: Stéphane Marchesin <stephane.marchesin@gmail.com>
Cc: Owen Taylor <otaylor@redhat.com>
Cc: "Meng, Mengmeng" <mengmeng.meng@intel.com>
Cc: "Zhuang, Lena" <lena.zhuang@intel.com>
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_drv.h |   5 ++
 drivers/gpu/drm/i915/i915_irq.c |  46 ++++++++++----
 drivers/gpu/drm/i915/i915_reg.h |   2 +-
 drivers/gpu/drm/i915/intel_pm.c | 135 ++++++++++++++++++++++++++++++----------
 4 files changed, 141 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 82658f8..0a1d249 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -850,8 +850,13 @@ struct intel_gen6_power_mgmt {
 	u8 min_delay;
 	u8 max_delay;
 	u8 rpe_delay;
+	u8 rp1_delay;
+	u8 rp0_delay;
 	u8 hw_max;
 
+	int last_adj;
+	enum { LOW_POWER, BETWEEN, HIGH_POWER } power;
+
 	struct delayed_work delayed_resume_work;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 68c0b08..e930366 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -818,7 +818,7 @@ static void gen6_pm_rps_work(struct work_struct *work)
 	drm_i915_private_t *dev_priv = container_of(work, drm_i915_private_t,
 						    rps.work);
 	u32 pm_iir;
-	u8 new_delay;
+	int new_delay, adj;
 
 	spin_lock_irq(&dev_priv->irq_lock);
 	pm_iir = dev_priv->rps.pm_iir;
@@ -835,29 +835,49 @@ static void gen6_pm_rps_work(struct work_struct *work)
 
 	mutex_lock(&dev_priv->rps.hw_lock);
 
+	adj = dev_priv->rps.last_adj;
 	if (pm_iir & GEN6_PM_RP_UP_THRESHOLD) {
-		new_delay = dev_priv->rps.cur_delay + 1;
+		if (adj > 0)
+			adj *= 2;
+		else
+			adj = 1;
+		new_delay = dev_priv->rps.cur_delay + adj;
 
 		/*
 		 * For better performance, jump directly
 		 * to RPe if we're below it.
 		 */
-		if (IS_VALLEYVIEW(dev_priv->dev) &&
-		    dev_priv->rps.cur_delay < dev_priv->rps.rpe_delay)
+		if (new_delay < dev_priv->rps.rpe_delay)
+			new_delay = dev_priv->rps.rpe_delay;
+	} else if (pm_iir & GEN6_PM_RP_DOWN_TIMEOUT) {
+		if (dev_priv->rps.cur_delay > dev_priv->rps.rpe_delay)
 			new_delay = dev_priv->rps.rpe_delay;
-	} else
-		new_delay = dev_priv->rps.cur_delay - 1;
+		else
+			new_delay = dev_priv->rps.min_delay;
+		adj = 0;
+	} else if (pm_iir & GEN6_PM_RP_DOWN_THRESHOLD) {
+		if (adj < 0)
+			adj *= 2;
+		else
+			adj = -1;
+		new_delay = dev_priv->rps.cur_delay + adj;
+	} else { /* unknown event */
+		new_delay = dev_priv->rps.cur_delay;
+	}
 
 	/* sysfs frequency interfaces may have snuck in while servicing the
 	 * interrupt
 	 */
-	if (new_delay >= dev_priv->rps.min_delay &&
-	    new_delay <= dev_priv->rps.max_delay) {
-		if (IS_VALLEYVIEW(dev_priv->dev))
-			valleyview_set_rps(dev_priv->dev, new_delay);
-		else
-			gen6_set_rps(dev_priv->dev, new_delay);
-	}
+	if (new_delay < (int)dev_priv->rps.min_delay)
+		new_delay = dev_priv->rps.min_delay;
+	if (new_delay > (int)dev_priv->rps.max_delay)
+		new_delay = dev_priv->rps.max_delay;
+	dev_priv->rps.last_adj = new_delay - dev_priv->rps.cur_delay;
+
+	if (IS_VALLEYVIEW(dev_priv->dev))
+		valleyview_set_rps(dev_priv->dev, new_delay);
+	else
+		gen6_set_rps(dev_priv->dev, new_delay);
 
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index c4f9bef..6248bd8 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -4675,7 +4675,7 @@
 #define   GEN6_RP_UP_IDLE_MIN			(0x1<<3)
 #define   GEN6_RP_UP_BUSY_AVG			(0x2<<3)
 #define   GEN6_RP_UP_BUSY_CONT			(0x4<<3)
-#define   GEN7_RP_DOWN_IDLE_AVG			(0x2<<0)
+#define   GEN6_RP_DOWN_IDLE_AVG			(0x2<<0)
 #define   GEN6_RP_DOWN_IDLE_CONT		(0x1<<0)
 #define GEN6_RP_UP_THRESHOLD			0xA02C
 #define GEN6_RP_DOWN_THRESHOLD			0xA030
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index d78a415..12951bc 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3295,6 +3295,98 @@ static u32 gen6_rps_limits(struct drm_i915_private *dev_priv, u8 val)
 	return limits;
 }
 
+static void gen6_set_rps_thresholds(struct drm_i915_private *dev_priv, u8 val)
+{
+	int new_power;
+
+	new_power = dev_priv->rps.power;
+	switch (dev_priv->rps.power) {
+	case LOW_POWER:
+		if (val > dev_priv->rps.rpe_delay + 1 && val > dev_priv->rps.cur_delay)
+			new_power = BETWEEN;
+		break;
+
+	case BETWEEN:
+		if (val <= dev_priv->rps.rpe_delay && val < dev_priv->rps.cur_delay)
+			new_power = LOW_POWER;
+		else if (val >= dev_priv->rps.rp0_delay && val > dev_priv->rps.cur_delay)
+			new_power = HIGH_POWER;
+		break;
+
+	case HIGH_POWER:
+		if (val < (dev_priv->rps.rp1_delay + dev_priv->rps.rp0_delay) >> 1 && val < dev_priv->rps.cur_delay)
+			new_power = BETWEEN;
+		break;
+	}
+	/* Max/min bins are special */
+	if (val == dev_priv->rps.min_delay)
+		new_power = LOW_POWER;
+	if (val == dev_priv->rps.max_delay)
+		new_power = HIGH_POWER;
+	if (new_power == dev_priv->rps.power)
+		return;
+
+	/* Note the units here are not exactly 1us, but 1280ns. */
+	switch (new_power) {
+	case LOW_POWER:
+		/* Upclock if more than 95% busy over 16ms */
+		I915_WRITE(GEN6_RP_UP_EI, 12500);
+		I915_WRITE(GEN6_RP_UP_THRESHOLD, 11800);
+
+		/* Downclock if less than 85% busy over 32ms */
+		I915_WRITE(GEN6_RP_DOWN_EI, 25000);
+		I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 21250);
+
+		I915_WRITE(GEN6_RP_CONTROL,
+			   GEN6_RP_MEDIA_TURBO |
+			   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+			   GEN6_RP_MEDIA_IS_GFX |
+			   GEN6_RP_ENABLE |
+			   GEN6_RP_UP_BUSY_AVG |
+			   GEN6_RP_DOWN_IDLE_AVG);
+		break;
+
+	case BETWEEN:
+		/* Upclock if more than 90% busy over 13ms */
+		I915_WRITE(GEN6_RP_UP_EI, 10250);
+		I915_WRITE(GEN6_RP_UP_THRESHOLD, 9225);
+
+		/* Downclock if less than 75% busy over 32ms */
+		I915_WRITE(GEN6_RP_DOWN_EI, 25000);
+		I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 18750);
+
+		I915_WRITE(GEN6_RP_CONTROL,
+			   GEN6_RP_MEDIA_TURBO |
+			   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+			   GEN6_RP_MEDIA_IS_GFX |
+			   GEN6_RP_ENABLE |
+			   GEN6_RP_UP_BUSY_AVG |
+			   GEN6_RP_DOWN_IDLE_AVG);
+		break;
+
+	case HIGH_POWER:
+		/* Upclock if more than 85% busy over 10ms */
+		I915_WRITE(GEN6_RP_UP_EI, 8000);
+		I915_WRITE(GEN6_RP_UP_THRESHOLD, 6800);
+
+		/* Downclock if less than 60% busy over 32ms */
+		I915_WRITE(GEN6_RP_DOWN_EI, 25000);
+		I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 15000);
+
+		I915_WRITE(GEN6_RP_CONTROL,
+			   GEN6_RP_MEDIA_TURBO |
+			   GEN6_RP_MEDIA_HW_NORMAL_MODE |
+			   GEN6_RP_MEDIA_IS_GFX |
+			   GEN6_RP_ENABLE |
+			   GEN6_RP_UP_BUSY_AVG |
+			   GEN6_RP_DOWN_IDLE_AVG);
+		break;
+	}
+
+	dev_priv->rps.power = new_power;
+	dev_priv->rps.last_adj = 0;
+}
+
 void gen6_set_rps(struct drm_device *dev, u8 val)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -3306,6 +3398,8 @@ void gen6_set_rps(struct drm_device *dev, u8 val)
 	if (val == dev_priv->rps.cur_delay)
 		return;
 
+	gen6_set_rps_thresholds(dev_priv, val);
+
 	if (IS_HASWELL(dev))
 		I915_WRITE(GEN6_RPNSWREQ,
 			   HSW_FREQUENCY(val));
@@ -3527,7 +3621,10 @@ static void gen6_enable_rps(struct drm_device *dev)
 
 	/* In units of 50MHz */
 	dev_priv->rps.hw_max = dev_priv->rps.max_delay = rp_state_cap & 0xff;
-	dev_priv->rps.min_delay = (rp_state_cap & 0xff0000) >> 16;
+	dev_priv->rps.min_delay = (rp_state_cap >> 16) & 0xff;
+	dev_priv->rps.rp1_delay = (rp_state_cap >>  8) & 0xff;
+	dev_priv->rps.rp0_delay = (rp_state_cap >>  0) & 0xff;
+	dev_priv->rps.rpe_delay = dev_priv->rps.rp1_delay;
 	dev_priv->rps.cur_delay = 0;
 
 	/* disable the counters and set deterministic thresholds */
@@ -3575,38 +3672,9 @@ static void gen6_enable_rps(struct drm_device *dev)
 		   GEN6_RC_CTL_EI_MODE(1) |
 		   GEN6_RC_CTL_HW_ENABLE);
 
-	if (IS_HASWELL(dev)) {
-		I915_WRITE(GEN6_RPNSWREQ,
-			   HSW_FREQUENCY(10));
-		I915_WRITE(GEN6_RC_VIDEO_FREQ,
-			   HSW_FREQUENCY(12));
-	} else {
-		I915_WRITE(GEN6_RPNSWREQ,
-			   GEN6_FREQUENCY(10) |
-			   GEN6_OFFSET(0) |
-			   GEN6_AGGRESSIVE_TURBO);
-		I915_WRITE(GEN6_RC_VIDEO_FREQ,
-			   GEN6_FREQUENCY(12));
-	}
-
-	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 1000000);
-	I915_WRITE(GEN6_RP_INTERRUPT_LIMITS,
-		   dev_priv->rps.max_delay << 24 |
-		   dev_priv->rps.min_delay << 16);
-
-	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
-	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
-	I915_WRITE(GEN6_RP_UP_EI, 66000);
-	I915_WRITE(GEN6_RP_DOWN_EI, 350000);
-
+	/* Power down if completely idle for over 50ms */
+	I915_WRITE(GEN6_RP_DOWN_TIMEOUT, 50000);
 	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
-	I915_WRITE(GEN6_RP_CONTROL,
-		   GEN6_RP_MEDIA_TURBO |
-		   GEN6_RP_MEDIA_HW_NORMAL_MODE |
-		   GEN6_RP_MEDIA_IS_GFX |
-		   GEN6_RP_ENABLE |
-		   GEN6_RP_UP_BUSY_AVG |
-		   (IS_HASWELL(dev) ? GEN7_RP_DOWN_IDLE_AVG : GEN6_RP_DOWN_IDLE_CONT));
 
 	ret = sandybridge_pcode_write(dev_priv, GEN6_PCODE_WRITE_MIN_FREQ_TABLE, 0);
 	if (!ret) {
@@ -3622,7 +3690,8 @@ static void gen6_enable_rps(struct drm_device *dev)
 		DRM_DEBUG_DRIVER("Failed to set the min frequency\n");
 	}
 
-	gen6_set_rps(dev_priv->dev, (gt_perf_status & 0xff00) >> 8);
+	dev_priv->rps.power = HIGH_POWER; /* force a reset */
+	gen6_set_rps(dev_priv->dev, dev_priv->rps.min_delay);
 
 	gen6_enable_rps_interrupts(dev);
 
-- 
1.8.1.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 12/13] drm/i915: Allow GT3 Slice Shutdown on Boot.
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (10 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 11/13] drm/i915: Tweak RPS thresholds to more aggressively downclock Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 20:33 ` [PATCH 13/13] drm/i915: Allow Dynamically GT3 Slice Shutdown Rodrigo Vivi
  2013-09-24 10:15 ` [PATCH 00/13] drm-intel-collector - review request Daniel Vetter
  13 siblings, 0 replies; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

Slice shutdown is a power savings feature whereby parts of HW i.e. slice is
shut off on boot or dynamically to save power.

This patch only introduces a way to disable half of Haswell GT3 slices on boot.

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_drv.c |  5 +++++
 drivers/gpu/drm/i915/i915_drv.h |  1 +
 drivers/gpu/drm/i915/i915_gem.c |  5 -----
 drivers/gpu/drm/i915/i915_reg.h | 11 +++++++++++
 drivers/gpu/drm/i915/intel_pm.c | 23 +++++++++++++++++++++++
 5 files changed, 40 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6f385e1..259b75c 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -154,6 +154,11 @@ module_param_named(prefault_disable, i915_prefault_disable, bool, 0600);
 MODULE_PARM_DESC(prefault_disable,
 		"Disable page prefaulting for pread/pwrite/reloc (default:false). For developers only.");
 
+int i915_gt3_policy __read_mostly = 1;
+module_param_named(gt3_policy, i915_gt3_policy, int, 0600);
+MODULE_PARM_DESC(gt3_policy,
+		 "GT3 boot with Full (1) or Half (0) slices enabled. (default:full)");
+
 static struct drm_driver driver;
 extern int intel_agp_enabled;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0a1d249..5725e06 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1760,6 +1760,7 @@ extern bool i915_fastboot __read_mostly;
 extern int i915_enable_pc8 __read_mostly;
 extern int i915_pc8_timeout __read_mostly;
 extern bool i915_prefault_disable __read_mostly;
+extern int i915_gt3_policy __read_mostly;
 
 extern int i915_suspend(struct drm_device *dev, pm_message_t state);
 extern int i915_resume(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fe7803f..3f17481 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4401,11 +4401,6 @@ i915_gem_init_hw(struct drm_device *dev)
 	if (dev_priv->ellc_size)
 		I915_WRITE(HSW_IDICR, I915_READ(HSW_IDICR) | IDIHASHMSK(0xf));
 
-	if (IS_HSW_GT3(dev))
-		I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_ENABLED);
-	else
-		I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_DISABLED);
-
 	if (HAS_PCH_NOP(dev)) {
 		u32 temp = I915_READ(GEN7_MSG_CTL);
 		temp &= ~(WAIT_FOR_PCH_FLR_ACK | WAIT_FOR_PCH_RESET_ACK);
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 6248bd8..2c87076 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -269,6 +269,17 @@
 #define  LOWER_SLICE_ENABLED	(1<<0)
 #define  LOWER_SLICE_DISABLED	(0<<0)
 
+#define HSW_GT_SLICE_INFO	0x138064
+#define   SLICE_SEL_BOTH	(1<<3)
+#define   SLICE_AUTOWAKE	(1<<2)
+#define   SLICE_STATUS_MASK	0x3
+#define   SLICE_STATUS_GT_OFF	(0<<0)
+#define   SLICE_STATUS_MAIN_ON	(2<<0)
+#define   SLICE_STATUS_BOTH_ON	(3<<0)
+
+#define HSW_SLICESHUTDOWN	0xA190
+#define   SLICE_SHUTDOWN	(1<<0)
+
 /*
  * 3D instructions used by the kernel
  */
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 12951bc..c03566e 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3712,6 +3712,28 @@ static void gen6_enable_rps(struct drm_device *dev)
 	gen6_gt_force_wake_put(dev_priv);
 }
 
+static void intel_init_gt3_slices(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (!IS_HSW_GT3(dev))
+		return;
+
+	if (!i915_gt3_policy) {
+		I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_DISABLED);
+		POSTING_READ(MI_PREDICATE_RESULT_2);
+
+		I915_WRITE(HSW_SLICESHUTDOWN, SLICE_SHUTDOWN);
+		POSTING_READ(HSW_SLICESHUTDOWN);
+
+		I915_WRITE(HSW_GT_SLICE_INFO, ~SLICE_SEL_BOTH);
+		POSTING_READ(HSW_GT_SLICE_INFO);
+	} else {
+		I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_ENABLED);
+		POSTING_READ(MI_PREDICATE_RESULT_2);
+	}
+}
+
 void gen6_update_ring_freq(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
@@ -4704,6 +4726,7 @@ static void intel_gen6_powersave_work(struct work_struct *work)
 	} else {
 		gen6_enable_rps(dev);
 		gen6_update_ring_freq(dev);
+		intel_init_gt3_slices(dev);
 	}
 	mutex_unlock(&dev_priv->rps.hw_lock);
 }
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 13/13] drm/i915: Allow Dynamically GT3 Slice Shutdown.
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (11 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 12/13] drm/i915: Allow GT3 Slice Shutdown on Boot Rodrigo Vivi
@ 2013-09-23 20:33 ` Rodrigo Vivi
  2013-09-23 21:13   ` Chris Wilson
  2013-09-24 10:15 ` [PATCH 00/13] drm-intel-collector - review request Daniel Vetter
  13 siblings, 1 reply; 28+ messages in thread
From: Rodrigo Vivi @ 2013-09-23 20:33 UTC (permalink / raw)
  To: intel-gfx

Slice shutdown is a power savings feature whereby parts of HW i.e. slice is
shut off on boot or dynamically to save power.

This patch introduces a sysfs interface to easily allow dynamically switch
between full and half GT3 slices.

v2: remove unused variables and fix identation

Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
---
 drivers/gpu/drm/i915/i915_sysfs.c | 48 +++++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_drv.h  |  4 +++-
 drivers/gpu/drm/i915/intel_pm.c   | 31 +++++++++++++++++++++++++
 3 files changed, 82 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
index 7659dd4..40f23a8 100644
--- a/drivers/gpu/drm/i915/i915_sysfs.c
+++ b/drivers/gpu/drm/i915/i915_sysfs.c
@@ -97,6 +97,48 @@ static struct attribute_group rc6_attr_group = {
 	.name = power_group_name,
 	.attrs =  rc6_attrs
 };
+
+static ssize_t gt3_policy_show(struct device *kdev,
+			      struct device_attribute *attr, char *buf)
+{
+	struct drm_minor *minor = container_of(kdev, struct drm_minor, kdev);
+	struct drm_device *dev = minor->dev;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	return sprintf(buf, "%s\n", I915_READ(MI_PREDICATE_RESULT_2) ==
+		       LOWER_SLICE_ENABLED ? "full" : "half");
+}
+
+static ssize_t gt3_policy_store(struct device *kdev,
+			       struct device_attribute *attr,
+			       const char *buf, size_t count)
+{
+	struct drm_minor *minor = container_of(kdev, struct drm_minor, kdev);
+	struct drm_device *dev = minor->dev;
+
+	if (!strncmp(buf, "full", sizeof("full") - 1))
+		intel_set_gt3_full(dev);
+	else if (!strncmp(buf, "half", sizeof("half") - 1))
+		intel_set_gt3_half(dev);
+	else
+		return -EINVAL;
+
+	return count;
+}
+
+static DEVICE_ATTR(gt3_policy, S_IRUGO | S_IWUSR,
+		   gt3_policy_show, gt3_policy_store);
+
+static struct attribute *gt3_policy_attrs[] = {
+	&dev_attr_gt3_policy.attr,
+	NULL
+};
+
+static struct attribute_group gt3_policy_attr_group = {
+	.name = power_group_name,
+	.attrs =  gt3_policy_attrs
+};
+
 #endif
 
 static int l3_access_valid(struct drm_device *dev, loff_t offset)
@@ -528,6 +570,12 @@ void i915_setup_sysfs(struct drm_device *dev)
 		if (ret)
 			DRM_ERROR("RC6 residency sysfs setup failed\n");
 	}
+	if (IS_HSW_GT3(dev)) {
+		ret = sysfs_merge_group(&dev->primary->kdev.kobj,
+					&gt3_policy_attr_group);
+		if (ret)
+			DRM_ERROR("GT3 policy sysfs setup failed\n");
+	}
 #endif
 	if (HAS_L3_DPF(dev)) {
 		ret = device_create_bin_file(&dev->primary->kdev, &dpf_attrs);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index f2ef5cb..2d28289 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -756,7 +756,9 @@ extern void intel_update_fbc(struct drm_device *dev);
 /* IPS */
 extern void intel_gpu_ips_init(struct drm_i915_private *dev_priv);
 extern void intel_gpu_ips_teardown(void);
-
+/* Slice Shutdown */
+extern void intel_set_gt3_full(struct drm_device *dev);
+extern void intel_set_gt3_half(struct drm_device *dev);
 /* Power well */
 extern int i915_init_power_well(struct drm_device *dev);
 extern void i915_remove_power_well(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index c03566e..e7a987f 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -3712,6 +3712,37 @@ static void gen6_enable_rps(struct drm_device *dev)
 	gen6_gt_force_wake_put(dev_priv);
 }
 
+void intel_set_gt3_full(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (!IS_HSW_GT3(dev))
+		return;
+
+	I915_WRITE(HSW_GT_SLICE_INFO, SLICE_SEL_BOTH);
+	POSTING_READ(HSW_GT_SLICE_INFO);
+
+	I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_ENABLED);
+	POSTING_READ(MI_PREDICATE_RESULT_2);
+}
+
+void intel_set_gt3_half(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	if (!IS_HSW_GT3(dev))
+		return;
+
+	I915_WRITE(HSW_SLICESHUTDOWN, SLICE_SHUTDOWN);
+	POSTING_READ(HSW_SLICESHUTDOWN);
+
+	I915_WRITE(HSW_GT_SLICE_INFO, ~SLICE_SEL_BOTH);
+	POSTING_READ(HSW_GT_SLICE_INFO);
+
+	I915_WRITE(MI_PREDICATE_RESULT_2, LOWER_SLICE_DISABLED);
+	POSTING_READ(MI_PREDICATE_RESULT_2);
+}
+
 static void intel_init_gt3_slices(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
-- 
1.8.1.4

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 13/13] drm/i915: Allow Dynamically GT3 Slice Shutdown.
  2013-09-23 20:33 ` [PATCH 13/13] drm/i915: Allow Dynamically GT3 Slice Shutdown Rodrigo Vivi
@ 2013-09-23 21:13   ` Chris Wilson
  0 siblings, 0 replies; 28+ messages in thread
From: Chris Wilson @ 2013-09-23 21:13 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Mon, Sep 23, 2013 at 05:33:30PM -0300, Rodrigo Vivi wrote:
> Slice shutdown is a power savings feature whereby parts of HW i.e. slice is
> shut off on boot or dynamically to save power.
> 
> This patch introduces a sysfs interface to easily allow dynamically switch
> between full and half GT3 slices.
> 
> v2: remove unused variables and fix identation

gt3_policy is a misnomer, what you have is more of a "slice config".

I'm not convinced by the boot parameter and want to see how this
interacts with the automatic configuration before declaring any
interface sign.

> 
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
> ---
>  drivers/gpu/drm/i915/i915_sysfs.c | 48 +++++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_drv.h  |  4 +++-
>  drivers/gpu/drm/i915/intel_pm.c   | 31 +++++++++++++++++++++++++
>  3 files changed, 82 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_sysfs.c b/drivers/gpu/drm/i915/i915_sysfs.c
> index 7659dd4..40f23a8 100644
> --- a/drivers/gpu/drm/i915/i915_sysfs.c
> +++ b/drivers/gpu/drm/i915/i915_sysfs.c
> @@ -97,6 +97,48 @@ static struct attribute_group rc6_attr_group = {
>  	.name = power_group_name,
>  	.attrs =  rc6_attrs
>  };
> +
> +static ssize_t gt3_policy_show(struct device *kdev,
> +			      struct device_attribute *attr, char *buf)
> +{
> +	struct drm_minor *minor = container_of(kdev, struct drm_minor, kdev);
> +	struct drm_device *dev = minor->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	return sprintf(buf, "%s\n", I915_READ(MI_PREDICATE_RESULT_2) ==
> +		       LOWER_SLICE_ENABLED ? "full" : "half");
> +}
> +
> +static ssize_t gt3_policy_store(struct device *kdev,
> +			       struct device_attribute *attr,
> +			       const char *buf, size_t count)
> +{
> +	struct drm_minor *minor = container_of(kdev, struct drm_minor, kdev);
> +	struct drm_device *dev = minor->dev;
> +
> +	if (!strncmp(buf, "full", sizeof("full") - 1))
> +		intel_set_gt3_full(dev);

Be strict and use strcmp(buf, "full") == 0

> +	else if (!strncmp(buf, "half", sizeof("half") - 1))
> +		intel_set_gt3_half(dev);
> +	else
> +		return -EINVAL;

This probably wants to be synchronous and not return before the change
has taken effect.

> +
> +	return count;
> +}
> +
> +static DEVICE_ATTR(gt3_policy, S_IRUGO | S_IWUSR,
> +		   gt3_policy_show, gt3_policy_store);
> +
> +static struct attribute *gt3_policy_attrs[] = {
> +	&dev_attr_gt3_policy.attr,
> +	NULL
> +};
> +
> +static struct attribute_group gt3_policy_attr_group = {
> +	.name = power_group_name,
> +	.attrs =  gt3_policy_attrs
> +};
> +
>  #endif
>  
>  static int l3_access_valid(struct drm_device *dev, loff_t offset)
> @@ -528,6 +570,12 @@ void i915_setup_sysfs(struct drm_device *dev)
>  		if (ret)
>  			DRM_ERROR("RC6 residency sysfs setup failed\n");
>  	}
> +	if (IS_HSW_GT3(dev)) {
> +		ret = sysfs_merge_group(&dev->primary->kdev.kobj,
> +					&gt3_policy_attr_group);
> +		if (ret)
> +			DRM_ERROR("GT3 policy sysfs setup failed\n");

Shouldn't there be a remove as well?

> +	}
>  #endif
>  	if (HAS_L3_DPF(dev)) {
>  		ret = device_create_bin_file(&dev->primary->kdev, &dpf_attrs);
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index f2ef5cb..2d28289 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -756,7 +756,9 @@ extern void intel_update_fbc(struct drm_device *dev);
>  /* IPS */
>  extern void intel_gpu_ips_init(struct drm_i915_private *dev_priv);
>  extern void intel_gpu_ips_teardown(void);
> -
> +/* Slice Shutdown */
> +extern void intel_set_gt3_full(struct drm_device *dev);
> +extern void intel_set_gt3_half(struct drm_device *dev);
>  /* Power well */
>  extern int i915_init_power_well(struct drm_device *dev);
>  extern void i915_remove_power_well(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index c03566e..e7a987f 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -3712,6 +3712,37 @@ static void gen6_enable_rps(struct drm_device *dev)
>  	gen6_gt_force_wake_put(dev_priv);
>  }
>  
> +void intel_set_gt3_full(struct drm_device *dev)

Just one function will do,

intel_set_slice_config(dev, enum slice_config config);

(and if the boot parameter remains it should call into it)

I don't think you should be changing MI_PREDICATE whilst the (a) the
slice configuration has not taken effect, (b) whilst the GPU is active,
(c) those are indeed the same.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/13] drm-intel-collector - review request
  2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
                   ` (12 preceding siblings ...)
  2013-09-23 20:33 ` [PATCH 13/13] drm/i915: Allow Dynamically GT3 Slice Shutdown Rodrigo Vivi
@ 2013-09-24 10:15 ` Daniel Vetter
  2013-09-24 10:32   ` Chris Wilson
  13 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2013-09-24 10:15 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Mon, Sep 23, 2013 at 05:33:17PM -0300, Rodrigo Vivi wrote:
> This is another drm-intel-collector push for review:
> http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=drm-intel-collector
> 
> Here goes the list in order for better reviewers assignment:
> 
> Patch 01/13 - 816e102 drm/i915: check that the i965g/gm 4G limit is really obeyed - Reviewer: Damien
> Patch 02/13 - 036f5da drm/i915: Move the conditional seqno query into the tracepoint - Reviewer: Ville
> Patch 03/13 - f381b05 drm/i915: Add some missing steps to i915_driver_load error path - Reviewer: Ben

Lazy reviers, tsk ;-)

> Patch 04/13 - f648dba drm/i915: Asynchronously perform the set-base for a simple modeset - Reviewer: 

Dunno what to do exactly with this, given that the android guys seem to
need more vblank waits in the plane ioctls. I'd drop for now, we need a
more coherent story here ... Or just wait for nuclear flips.

> Patch 05/13 - ec31a06 drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT - Reviewer: 

I think scanouts in stolen have died for now, too many issues. So I guess
we can drop this, since there are other issues.

> Patch 06/13 - 3b2a43a drm/i915: Pair seqno completion tracepoint with its dispatch - Reviewer: 

There's some bikeshed pending from Thomas Gleixner in the same are to move
the i915_ring_get_seqno out of the tracepoint. I think if we frob this we
might as well do it right.

> Patch 07/13 - 4d98ddd drm/i915: Initialise min/max frequencies before updating RPS registers - Reviewer: 

Ville had some bikesheds pending, and there's the issue of the delayed rps
work. I'd drop and wait for a new version.

> Patch 08/13 - 35983bd drm/i915: Delay the relase of the forcewake by a jiffie - Reviewer: 

Already reviewed by Ville and needs a bit work it seems, you can drop and
wait for a new version.

> Patch 09/13 - 14141ee drm/i915: Add a tracepoint for using a semaphore - Reviewer: 

Ville.

> Patch 10/13 - 06d2851 drm/i915: Boost RPS frequency for CPU stalls - Reviewer: 
> Patch 11/13 - e64dce9 drm/i915: Tweak RPS thresholds to more aggressively downclock - Reviewer: 

It sounds like Chris has updated versions of these somewhere. Imo you can
drop these here. Also Jesse should take a look.

> Patch 12/13 - 7b6c68c drm/i915: Allow GT3 Slice Shutdown on Boot. - Reviewer: 
> Patch 13/13 - 2f3c359 drm/i915: Allow Dynamically GT3 Slice Shutdown. - Reviewer: 

Imo this shouldn't be an official interface in sysf until we have a solid interface. So I vote
to drop patch 12 and rework patch 13 to use debugfs. For the real deal
(i.e. a flag in execbuf that userspace has the right commands to support
dynamic switching) we ofc need testcases in igt ;-)

> 
> Overall Process has changed a bit in order to avoid discussion split/duplications and for poking reviewers:
> 
> 1. Daniel pushs drm-intel-testing (every 2 weeks)
> 2. I rebase drm-intel-collector onto drm-intel-testing
> 3. Add Reviewer: tag with voluntered reviewers. If you don't believe you should be assigned on a particular patch please don't get mad just tell you wont review or volunteer someone else.
> 4. I resubmit remaining patches for review without picking any new (drm-intel-collector - review request)

4b) Push out the same patches to drm-intel-collector on git.fd.o - the
current branch is 2 weeks old ;-)

Also can you upload the scrip you're using somewhere, like I update the
maintainer scrip with every push to my rerere-cache branch? And if you
don't have this fully scripted yet, that needs to be fixed asap ;-)

Cheers, Daniel

> 5. One week later I collect all simple (1-2) patches that wasn't yet reviewed and not queued by Daniel from one testing update until another.
> 6. Request automated QA's PRTS automated i-g-t tests comparing drm-intel-testing x drm-intel-collector
> 7. If tests are ok I send the update notification or the patches as a series to intel-gfx mailing list for better tracking and to be reviewed. (drm-intel-collector - updated)
> 8. Let me know volunteers for review new patches and also let me know if I've picked any patch that I shouldn't.
> 
> There are some reasons that some patches can be left behind:
> 1. It was send so long time ago. I started with patches from Jul 26th.
> 2. Your patch didn't applied cleanly and I couldn't easily solve the conflicts.
> 3. Kernel didn't compiled with your patch.
> 4. I simply missed it. If you believe this is the case please warn me.
> 
> Please help me to get these patches reviewed and queued by Daniel.
> 
> Also, please let me know if you have further ideas how to improve this process.
> 
> Thanks in advance,
> Rodrigo.
> 
> Chris Wilson (10):
>   drm/i915: Move the conditional seqno query into the tracepoint
>   drm/i915: Add some missing steps to i915_driver_load error path
>   drm/i915: Asynchronously perform the set-base for a simple modeset
>   drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT
>   drm/i915: Pair seqno completion tracepoint with its dispatch
>   drm/i915: Initialise min/max frequencies before updating RPS registers
>   drm/i915: Delay the relase of the forcewake by a jiffie
>   drm/i915: Add a tracepoint for using a semaphore
>   drm/i915: Boost RPS frequency for CPU stalls
>   drm/i915: Tweak RPS thresholds to more aggressively downclock
> 
> Daniel Vetter (1):
>   drm/i915: check that the i965g/gm 4G limit is really obeyed
> 
> Rodrigo Vivi (2):
>   drm/i915: Allow GT3 Slice Shutdown on Boot.
>   drm/i915: Allow Dynamically GT3 Slice Shutdown.

> 
>  drivers/gpu/drm/i915/i915_debugfs.c        |   2 +-
>  drivers/gpu/drm/i915/i915_dma.c            |  41 ++---
>  drivers/gpu/drm/i915/i915_drv.c            |   5 +
>  drivers/gpu/drm/i915/i915_drv.h            |  28 +++-
>  drivers/gpu/drm/i915/i915_gem.c            | 146 +++++++++++------
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
>  drivers/gpu/drm/i915/i915_irq.c            |  59 ++++---
>  drivers/gpu/drm/i915/i915_reg.h            |  13 +-
>  drivers/gpu/drm/i915/i915_sysfs.c          |  64 +++++++-
>  drivers/gpu/drm/i915/i915_trace.h          |  52 ++++--
>  drivers/gpu/drm/i915/intel_display.c       |  16 +-
>  drivers/gpu/drm/i915/intel_drv.h           |   7 +-
>  drivers/gpu/drm/i915/intel_pm.c            | 250 +++++++++++++++++++++--------
>  drivers/gpu/drm/i915/intel_uncore.c        |  30 +++-
>  14 files changed, 525 insertions(+), 190 deletions(-)
> 
> --
> 1.8.1.4
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/13] drm-intel-collector - review request
  2013-09-24 10:15 ` [PATCH 00/13] drm-intel-collector - review request Daniel Vetter
@ 2013-09-24 10:32   ` Chris Wilson
  2013-09-24 10:47     ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-09-24 10:32 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Sep 24, 2013 at 12:15:19PM +0200, Daniel Vetter wrote:
> On Mon, Sep 23, 2013 at 05:33:17PM -0300, Rodrigo Vivi wrote:
> > This is another drm-intel-collector push for review:
> > http://cgit.freedesktop.org/~vivijim/drm-intel/log/?h=drm-intel-collector
> > 
> > Here goes the list in order for better reviewers assignment:
> > 
> > Patch 01/13 - 816e102 drm/i915: check that the i965g/gm 4G limit is really obeyed - Reviewer: Damien
> > Patch 02/13 - 036f5da drm/i915: Move the conditional seqno query into the tracepoint - Reviewer: Ville
> > Patch 03/13 - f381b05 drm/i915: Add some missing steps to i915_driver_load error path - Reviewer: Ben
> 
> Lazy reviers, tsk ;-)
> 
> > Patch 04/13 - f648dba drm/i915: Asynchronously perform the set-base for a simple modeset - Reviewer: 
> 
> Dunno what to do exactly with this, given that the android guys seem to
> need more vblank waits in the plane ioctls. I'd drop for now, we need a
> more coherent story here ... Or just wait for nuclear flips.

That is irrevelant for that patch. Try it, see how much it improves
switching between fb/X.

> > Patch 05/13 - ec31a06 drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT - Reviewer: 
> 
> I think scanouts in stolen have died for now, too many issues. So I guess
> we can drop this, since there are other issues.
> 
> > Patch 06/13 - 3b2a43a drm/i915: Pair seqno completion tracepoint with its dispatch - Reviewer: 
> 
> There's some bikeshed pending from Thomas Gleixner in the same are to move
> the i915_ring_get_seqno out of the tracepoint. I think if we frob this we
> might as well do it right.

Hah, disagree completely there since that is pessimising the common
case. There is a reason why the expensive operation should only be done
when the tracepoint is active and if he wants to rewrite the tracepoint
API then he is welcome to submit patches...
 
> > Patch 07/13 - 4d98ddd drm/i915: Initialise min/max frequencies before updating RPS registers - Reviewer: 
> 
> Ville had some bikesheds pending, and there's the issue of the delayed rps
> work. I'd drop and wait for a new version.

No, he didn't. The bikesheds were not on the patch but on the
surrounding code.

> > Patch 08/13 - 35983bd drm/i915: Delay the relase of the forcewake by a jiffie - Reviewer: 
> 
> Already reviewed by Ville and needs a bit work it seems, you can drop and
> wait for a new version.

The version I (thought I) sent last has reviewed-by tags. It shouldn't
be dropped.
 
> > Patch 09/13 - 14141ee drm/i915: Add a tracepoint for using a semaphore - Reviewer: 
> 
> Ville.
> 
> > Patch 10/13 - 06d2851 drm/i915: Boost RPS frequency for CPU stalls - Reviewer: 
> > Patch 11/13 - e64dce9 drm/i915: Tweak RPS thresholds to more aggressively downclock - Reviewer: 
> 
> It sounds like Chris has updated versions of these somewhere. Imo you can
> drop these here. Also Jesse should take a look.

These seriously need review, they make a dramatic improvement to certain
benchmarks, and have a clear impact on user experience.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/13] drm-intel-collector - review request
  2013-09-24 10:32   ` Chris Wilson
@ 2013-09-24 10:47     ` Daniel Vetter
  2013-09-24 11:11       ` Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2013-09-24 10:47 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Rodrigo Vivi, intel-gfx

On Tue, Sep 24, 2013 at 12:32 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>> > Patch 04/13 - f648dba drm/i915: Asynchronously perform the set-base for a simple modeset - Reviewer:
>>
>> Dunno what to do exactly with this, given that the android guys seem to
>> need more vblank waits in the plane ioctls. I'd drop for now, we need a
>> more coherent story here ... Or just wait for nuclear flips.
>
> That is irrevelant for that patch. Try it, see how much it improves
> switching between fb/X.

Well if we end up with some userspace rendering garbage then we can't
do it ... And I somewhat suspect that the vblank waits in the android
patch are there exactly for that reason.

>> > Patch 05/13 - ec31a06 drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT - Reviewer:
>>
>> I think scanouts in stolen have died for now, too many issues. So I guess
>> we can drop this, since there are other issues.
>>
>> > Patch 06/13 - 3b2a43a drm/i915: Pair seqno completion tracepoint with its dispatch - Reviewer:
>>
>> There's some bikeshed pending from Thomas Gleixner in the same are to move
>> the i915_ring_get_seqno out of the tracepoint. I think if we frob this we
>> might as well do it right.
>
> Hah, disagree completely there since that is pessimising the common
> case. There is a reason why the expensive operation should only be done
> when the tracepoint is active and if he wants to rewrite the tracepoint
> API then he is welcome to submit patches...

The idea that was floated was to use the reg/unreg functions of
DEFINE_TRACE_FN to do the irq get/put dance. The problem seems to be
that doing fancy stuff from within the tracepoint itself isn't awesome
for -rt locking. So we don't enable the interrupts when we don't need
them, but only when the tracepoint is active.

The slight problem is that I don't see any users of this stuff, so I
think we can still go meh.

>> > Patch 07/13 - 4d98ddd drm/i915: Initialise min/max frequencies before updating RPS registers - Reviewer:
>>
>> Ville had some bikesheds pending, and there's the issue of the delayed rps
>> work. I'd drop and wait for a new version.
>
> No, he didn't. The bikesheds were not on the patch but on the
> surrounding code.

I'd still like to have an overall solution here ;-)

>> > Patch 08/13 - 35983bd drm/i915: Delay the relase of the forcewake by a jiffie - Reviewer:
>>
>> Already reviewed by Ville and needs a bit work it seems, you can drop and
>> wait for a new version.
>
> The version I (thought I) sent last has reviewed-by tags. It shouldn't
> be dropped.

Oops, indeed. Rodrigo please double-check that you still have the
latest version of patches when resending. Patch is now applied.

>> > Patch 09/13 - 14141ee drm/i915: Add a tracepoint for using a semaphore - Reviewer:
>>
>> Ville.
>>
>> > Patch 10/13 - 06d2851 drm/i915: Boost RPS frequency for CPU stalls - Reviewer:
>> > Patch 11/13 - e64dce9 drm/i915: Tweak RPS thresholds to more aggressively downclock - Reviewer:
>>
>> It sounds like Chris has updated versions of these somewhere. Imo you can
>> drop these here. Also Jesse should take a look.
>
> These seriously need review, they make a dramatic improvement to certain
> benchmarks, and have a clear impact on user experience.

Agreed. I think Jesse is volunteered to play with this stuff ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/13] drm-intel-collector - review request
  2013-09-24 10:47     ` Daniel Vetter
@ 2013-09-24 11:11       ` Chris Wilson
  2013-09-24 11:22         ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-09-24 11:11 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Sep 24, 2013 at 12:47:07PM +0200, Daniel Vetter wrote:
> On Tue, Sep 24, 2013 at 12:32 PM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >> > Patch 04/13 - f648dba drm/i915: Asynchronously perform the set-base for a simple modeset - Reviewer:
> >>
> >> Dunno what to do exactly with this, given that the android guys seem to
> >> need more vblank waits in the plane ioctls. I'd drop for now, we need a
> >> more coherent story here ... Or just wait for nuclear flips.
> >
> > That is irrevelant for that patch. Try it, see how much it improves
> > switching between fb/X.
> 
> Well if we end up with some userspace rendering garbage then we can't
> do it ... And I somewhat suspect that the vblank waits in the android
> patch are there exactly for that reason.

But we shouldn't even be doing a set-base if we have secondary planes
that need interaction if the primary changes, hence my claims for
irrelevance. (set-base does not do any checks for secondary planes.)
Furthermore, we need to review allowing pagefips for the primary if that
causes glitches on secondary planes. Or just line up the hw designers
alongside the bios writers.

> >> > Patch 06/13 - 3b2a43a drm/i915: Pair seqno completion tracepoint with its dispatch - Reviewer:
> >>
> >> There's some bikeshed pending from Thomas Gleixner in the same are to move
> >> the i915_ring_get_seqno out of the tracepoint. I think if we frob this we
> >> might as well do it right.
> >
> > Hah, disagree completely there since that is pessimising the common
> > case. There is a reason why the expensive operation should only be done
> > when the tracepoint is active and if he wants to rewrite the tracepoint
> > API then he is welcome to submit patches...
> 
> The idea that was floated was to use the reg/unreg functions of
> DEFINE_TRACE_FN to do the irq get/put dance. The problem seems to be
> that doing fancy stuff from within the tracepoint itself isn't awesome
> for -rt locking. So we don't enable the interrupts when we don't need
> them, but only when the tracepoint is active.

And besides the complaint was about trace_irq_seqno, was it not?

> The slight problem is that I don't see any users of this stuff, so I
> think we can still go meh.

Indeed, I do not know how to do what he wants under the confines of the
current tracepoint API.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 00/13] drm-intel-collector - review request
  2013-09-24 11:11       ` Chris Wilson
@ 2013-09-24 11:22         ` Daniel Vetter
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-09-24 11:22 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Rodrigo Vivi, intel-gfx

On Tue, Sep 24, 2013 at 12:11:27PM +0100, Chris Wilson wrote:
> On Tue, Sep 24, 2013 at 12:47:07PM +0200, Daniel Vetter wrote:
> > The idea that was floated was to use the reg/unreg functions of
> > DEFINE_TRACE_FN to do the irq get/put dance. The problem seems to be
> > that doing fancy stuff from within the tracepoint itself isn't awesome
> > for -rt locking. So we don't enable the interrupts when we don't need
> > them, but only when the tracepoint is active.
> 
> And besides the complaint was about trace_irq_seqno, was it not?

It's the ->get_irq call hidden in i915_trace_irq_get within a
TP_fast_assign clause. The racy business around accessing
ring->trace_irq_seqno is a bit orthogonal I think.

> > The slight problem is that I don't see any users of this stuff, so I
> > think we can still go meh.
> 
> Indeed, I do not know how to do what he wants under the confines of the
> current tracepoint API.

Hm, maybe I need to yell at him again.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/13] drm/i915: Move the conditional seqno query into the tracepoint
  2013-09-23 20:33 ` [PATCH 02/13] drm/i915: Move the conditional seqno query into the tracepoint Rodrigo Vivi
@ 2013-09-24 18:50   ` Ville Syrjälä
  2013-09-24 19:20     ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2013-09-24 18:50 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Mon, Sep 23, 2013 at 05:33:19PM -0300, Rodrigo Vivi wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We only wish to know the value of seqno when emitting the tracepoint, so
> move the query from a parameter to the macro to inside the conditional
> macro body so that the query is only evaluated when required.
> 
> Reviewer: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>

Looks OK to me.

Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_irq.c   |  2 +-
>  drivers/gpu/drm/i915/i915_trace.h | 21 ++++++++++++++++++---
>  2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index b356dc1..84b7efc 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -807,7 +807,7 @@ static void notify_ring(struct drm_device *dev,
>  	if (ring->obj == NULL)
>  		return;
>  
> -	trace_i915_gem_request_complete(ring, ring->get_seqno(ring, false));
> +	trace_i915_gem_request_complete(ring);
>  
>  	wake_up_all(&ring->irq_queue);
>  	i915_queue_hangcheck(dev);
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index e2c5ee6..a1797f6 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -304,9 +304,24 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
>  	    TP_ARGS(ring, seqno)
>  );
>  
> -DEFINE_EVENT(i915_gem_request, i915_gem_request_complete,
> -	    TP_PROTO(struct intel_ring_buffer *ring, u32 seqno),
> -	    TP_ARGS(ring, seqno)
> +TRACE_EVENT(i915_gem_request_complete,
> +	    TP_PROTO(struct intel_ring_buffer *ring),
> +	    TP_ARGS(ring),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(u32, ring)
> +			     __field(u32, seqno)
> +			     ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = ring->dev->primary->index;
> +			   __entry->ring = ring->id;
> +			   __entry->seqno = ring->get_seqno(ring, false);
> +			   ),
> +
> +	    TP_printk("dev=%u, ring=%u, seqno=%u",
> +		      __entry->dev, __entry->ring, __entry->seqno)
>  );
>  
>  DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
> -- 
> 1.8.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 02/13] drm/i915: Move the conditional seqno query into the tracepoint
  2013-09-24 18:50   ` Ville Syrjälä
@ 2013-09-24 19:20     ` Daniel Vetter
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-09-24 19:20 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Tue, Sep 24, 2013 at 09:50:58PM +0300, Ville Syrjälä wrote:
> On Mon, Sep 23, 2013 at 05:33:19PM -0300, Rodrigo Vivi wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > We only wish to know the value of seqno when emitting the tracepoint, so
> > move the query from a parameter to the macro to inside the conditional
> > macro body so that the query is only evaluated when required.
> > 
> > Reviewer: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
> 
> Looks OK to me.
> 
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Queued for -next, thanks for the patch, review and herding in -collector.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/13] drm/i915: Add a tracepoint for using a semaphore
  2013-09-23 20:33 ` [PATCH 09/13] drm/i915: Add a tracepoint for using a semaphore Rodrigo Vivi
@ 2013-09-25  9:34   ` Ville Syrjälä
  2013-09-25 10:11     ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2013-09-25  9:34 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Mon, Sep 23, 2013 at 05:33:26PM -0300, Rodrigo Vivi wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> So that we can find the callers who introduce a ring stall. A single
> ring stall is not too unwelcome, the right issue becomes when they start
> to interlock and prevent any concurrent work. That, however, is a little
> tricker to detect with a mere tracepoint!
> 
> v2: Rebrand it as a ring event, rather than an object event.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>

Just wondering if we would want to see the seqno(s) in the trace as well?

But anyway, the patch looks fine.
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/i915_gem.c   |  2 ++
>  drivers/gpu/drm/i915/i915_trace.h | 19 +++++++++++++++++++
>  2 files changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d68cc5c..4a16491 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2614,6 +2614,8 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  	if (ret)
>  		return ret;
>  
> +	trace_i915_gem_ring_sync_to(from, to);
> +
>  	ret = to->sync_to(to, from, seqno);
>  	if (!ret)
>  		/* We use last_read_seqno because sync_to()
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index 5c8e36a..48e8f07 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -233,6 +233,25 @@ TRACE_EVENT(i915_gem_evict_everything,
>  	    TP_printk("dev=%d", __entry->dev)
>  );
>  
> +TRACE_EVENT(i915_gem_ring_sync_to,
> +	    TP_PROTO(struct intel_ring_buffer *from, struct intel_ring_buffer *to),
> +	    TP_ARGS(from, to),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(u32, sync_from)
> +			     __field(u32, sync_to)
> +			     ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = from->dev->primary->index;
> +			   __entry->sync_from = from->id;
> +			   __entry->sync_to = to->id;
> +			   ),
> +
> +	    TP_printk("dev=%u, sync-from=%u, sync-to=%u", __entry->dev, __entry->sync_from, __entry->sync_to)
> +);
> +
>  TRACE_EVENT(i915_gem_ring_dispatch,
>  	    TP_PROTO(struct intel_ring_buffer *ring, u32 flags),
>  	    TP_ARGS(ring, flags),
> -- 
> 1.8.1.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 09/13] drm/i915: Add a tracepoint for using a semaphore
  2013-09-25  9:34   ` Ville Syrjälä
@ 2013-09-25 10:11     ` Daniel Vetter
  2013-09-25 10:43       ` [PATCH] " Chris Wilson
  0 siblings, 1 reply; 28+ messages in thread
From: Daniel Vetter @ 2013-09-25 10:11 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Wed, Sep 25, 2013 at 12:34:37PM +0300, Ville Syrjälä wrote:
> On Mon, Sep 23, 2013 at 05:33:26PM -0300, Rodrigo Vivi wrote:
> > From: Chris Wilson <chris@chris-wilson.co.uk>
> > 
> > So that we can find the callers who introduce a ring stall. A single
> > ring stall is not too unwelcome, the right issue becomes when they start
> > to interlock and prevent any concurrent work. That, however, is a little
> > tricker to detect with a mere tracepoint!
> > 
> > v2: Rebrand it as a ring event, rather than an object event.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>
> 
> Just wondering if we would want to see the seqno(s) in the trace as well?

Hm yeah, I guess the seqno we're syncing on the from ring would be useful
to gauge how much parallelism there really is. Chris, care to respin?
-Daniel

> 
> But anyway, the patch looks fine.
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c   |  2 ++
> >  drivers/gpu/drm/i915/i915_trace.h | 19 +++++++++++++++++++
> >  2 files changed, 21 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index d68cc5c..4a16491 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2614,6 +2614,8 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
> >  	if (ret)
> >  		return ret;
> >  
> > +	trace_i915_gem_ring_sync_to(from, to);
> > +
> >  	ret = to->sync_to(to, from, seqno);
> >  	if (!ret)
> >  		/* We use last_read_seqno because sync_to()
> > diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> > index 5c8e36a..48e8f07 100644
> > --- a/drivers/gpu/drm/i915/i915_trace.h
> > +++ b/drivers/gpu/drm/i915/i915_trace.h
> > @@ -233,6 +233,25 @@ TRACE_EVENT(i915_gem_evict_everything,
> >  	    TP_printk("dev=%d", __entry->dev)
> >  );
> >  
> > +TRACE_EVENT(i915_gem_ring_sync_to,
> > +	    TP_PROTO(struct intel_ring_buffer *from, struct intel_ring_buffer *to),
> > +	    TP_ARGS(from, to),
> > +
> > +	    TP_STRUCT__entry(
> > +			     __field(u32, dev)
> > +			     __field(u32, sync_from)
> > +			     __field(u32, sync_to)
> > +			     ),
> > +
> > +	    TP_fast_assign(
> > +			   __entry->dev = from->dev->primary->index;
> > +			   __entry->sync_from = from->id;
> > +			   __entry->sync_to = to->id;
> > +			   ),
> > +
> > +	    TP_printk("dev=%u, sync-from=%u, sync-to=%u", __entry->dev, __entry->sync_from, __entry->sync_to)
> > +);
> > +
> >  TRACE_EVENT(i915_gem_ring_dispatch,
> >  	    TP_PROTO(struct intel_ring_buffer *ring, u32 flags),
> >  	    TP_ARGS(ring, flags),
> > -- 
> > 1.8.1.4
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH] drm/i915: Add a tracepoint for using a semaphore
  2013-09-25 10:11     ` Daniel Vetter
@ 2013-09-25 10:43       ` Chris Wilson
  2013-09-25 11:29         ` Ville Syrjälä
  0 siblings, 1 reply; 28+ messages in thread
From: Chris Wilson @ 2013-09-25 10:43 UTC (permalink / raw)
  To: intel-gfx

So that we can find the callers who introduce a ring stall. A single
ring stall is not too unwelcome, the right issue becomes when they start
to interlock and prevent any concurrent work. That, however, is a little
tricker to detect with a mere tracepoint!

v2: Rebrand it as a ring event, rather than an object event.
v3: Include the seqno in the tracepoint for posterity or something.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c   |  1 +
 drivers/gpu/drm/i915/i915_trace.h | 26 ++++++++++++++++++++++++++
 2 files changed, 27 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 82bc029..fa3b373 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2625,6 +2625,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (ret)
 		return ret;
 
+	trace_i915_gem_ring_sync_to(from, to, seqno);
 	ret = to->sync_to(to, from, seqno);
 	if (!ret)
 		/* We use last_read_seqno because sync_to()
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index daa6fdf..6e580c9 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -248,6 +248,32 @@ TRACE_EVENT(i915_gem_evict_vm,
 	    TP_printk("dev=%d, vm=%p", __entry->vm->dev->primary->index, __entry->vm)
 );
 
+TRACE_EVENT(i915_gem_ring_sync_to,
+	    TP_PROTO(struct intel_ring_buffer *from,
+		     struct intel_ring_buffer *to,
+		     u32 seqno),
+	    TP_ARGS(from, to, seqno),
+
+	    TP_STRUCT__entry(
+			     __field(u32, dev)
+			     __field(u32, sync_from)
+			     __field(u32, sync_to)
+			     __field(u32, seqno)
+			     ),
+
+	    TP_fast_assign(
+			   __entry->dev = from->dev->primary->index;
+			   __entry->sync_from = from->id;
+			   __entry->sync_to = to->id;
+			   __entry->seqno = seqno;
+			   ),
+
+	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u",
+		      __entry->dev,
+		      __entry->sync_from, __entry->sync_to,
+		      __entry->seqno)
+);
+
 TRACE_EVENT(i915_gem_ring_dispatch,
 	    TP_PROTO(struct intel_ring_buffer *ring, u32 seqno, u32 flags),
 	    TP_ARGS(ring, seqno, flags),
-- 
1.8.4.rc3

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] drm/i915: Add a tracepoint for using a semaphore
  2013-09-25 10:43       ` [PATCH] " Chris Wilson
@ 2013-09-25 11:29         ` Ville Syrjälä
  2013-09-25 11:31           ` Daniel Vetter
  0 siblings, 1 reply; 28+ messages in thread
From: Ville Syrjälä @ 2013-09-25 11:29 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Wed, Sep 25, 2013 at 11:43:28AM +0100, Chris Wilson wrote:
> So that we can find the callers who introduce a ring stall. A single
> ring stall is not too unwelcome, the right issue becomes when they start
> to interlock and prevent any concurrent work. That, however, is a little
> tricker to detect with a mere tracepoint!
> 
> v2: Rebrand it as a ring event, rather than an object event.
> v3: Include the seqno in the tracepoint for posterity or something.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_gem.c   |  1 +
>  drivers/gpu/drm/i915/i915_trace.h | 26 ++++++++++++++++++++++++++
>  2 files changed, 27 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 82bc029..fa3b373 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2625,6 +2625,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
>  	if (ret)
>  		return ret;
>  
> +	trace_i915_gem_ring_sync_to(from, to, seqno);
>  	ret = to->sync_to(to, from, seqno);

Passing the rings in the same order to both might avoid some confusion,
but I don't care enough to withhold my r-b so:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

>  	if (!ret)
>  		/* We use last_read_seqno because sync_to()
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index daa6fdf..6e580c9 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -248,6 +248,32 @@ TRACE_EVENT(i915_gem_evict_vm,
>  	    TP_printk("dev=%d, vm=%p", __entry->vm->dev->primary->index, __entry->vm)
>  );
>  
> +TRACE_EVENT(i915_gem_ring_sync_to,
> +	    TP_PROTO(struct intel_ring_buffer *from,
> +		     struct intel_ring_buffer *to,
> +		     u32 seqno),
> +	    TP_ARGS(from, to, seqno),
> +
> +	    TP_STRUCT__entry(
> +			     __field(u32, dev)
> +			     __field(u32, sync_from)
> +			     __field(u32, sync_to)
> +			     __field(u32, seqno)
> +			     ),
> +
> +	    TP_fast_assign(
> +			   __entry->dev = from->dev->primary->index;
> +			   __entry->sync_from = from->id;
> +			   __entry->sync_to = to->id;
> +			   __entry->seqno = seqno;
> +			   ),
> +
> +	    TP_printk("dev=%u, sync-from=%u, sync-to=%u, seqno=%u",
> +		      __entry->dev,
> +		      __entry->sync_from, __entry->sync_to,
> +		      __entry->seqno)
> +);
> +
>  TRACE_EVENT(i915_gem_ring_dispatch,
>  	    TP_PROTO(struct intel_ring_buffer *ring, u32 seqno, u32 flags),
>  	    TP_ARGS(ring, seqno, flags),
> -- 
> 1.8.4.rc3
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] drm/i915: Add a tracepoint for using a semaphore
  2013-09-25 11:29         ` Ville Syrjälä
@ 2013-09-25 11:31           ` Daniel Vetter
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-09-25 11:31 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: intel-gfx

On Wed, Sep 25, 2013 at 02:29:34PM +0300, Ville Syrjälä wrote:
> On Wed, Sep 25, 2013 at 11:43:28AM +0100, Chris Wilson wrote:
> > So that we can find the callers who introduce a ring stall. A single
> > ring stall is not too unwelcome, the right issue becomes when they start
> > to interlock and prevent any concurrent work. That, however, is a little
> > tricker to detect with a mere tracepoint!
> > 
> > v2: Rebrand it as a ring event, rather than an object event.
> > v3: Include the seqno in the tracepoint for posterity or something.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_gem.c   |  1 +
> >  drivers/gpu/drm/i915/i915_trace.h | 26 ++++++++++++++++++++++++++
> >  2 files changed, 27 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 82bc029..fa3b373 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -2625,6 +2625,7 @@ i915_gem_object_sync(struct drm_i915_gem_object *obj,
> >  	if (ret)
> >  		return ret;
> >  
> > +	trace_i915_gem_ring_sync_to(from, to, seqno);
> >  	ret = to->sync_to(to, from, seqno);
> 
> Passing the rings in the same order to both might avoid some confusion,
> but I don't care enough to withhold my r-b so:
> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Queued for -next, thanks for the patch.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 03/13] drm/i915: Add some missing steps to i915_driver_load error path
  2013-09-23 20:33 ` [PATCH 03/13] drm/i915: Add some missing steps to i915_driver_load error path Rodrigo Vivi
@ 2013-10-02 17:00   ` Daniel Vetter
  0 siblings, 0 replies; 28+ messages in thread
From: Daniel Vetter @ 2013-10-02 17:00 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx

On Mon, Sep 23, 2013 at 05:33:20PM -0300, Rodrigo Vivi wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> We missed adding a few cleanup steps for recent additions.
> 
> Reviewer:  Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com>

Queued for -next, thanks for the patch. I've picked Ben's r-b from the
original submission but figured I should keep your sob here to give credit
where it's due.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2013-10-02 17:00 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-09-23 20:33 [PATCH 00/13] drm-intel-collector - review request Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 01/13] drm/i915: check that the i965g/gm 4G limit is really obeyed Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 02/13] drm/i915: Move the conditional seqno query into the tracepoint Rodrigo Vivi
2013-09-24 18:50   ` Ville Syrjälä
2013-09-24 19:20     ` Daniel Vetter
2013-09-23 20:33 ` [PATCH 03/13] drm/i915: Add some missing steps to i915_driver_load error path Rodrigo Vivi
2013-10-02 17:00   ` Daniel Vetter
2013-09-23 20:33 ` [PATCH 04/13] drm/i915: Asynchronously perform the set-base for a simple modeset Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 05/13] drm/i915: Align tiled scanouts from stolen memory to 256k in the GTT Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 06/13] drm/i915: Pair seqno completion tracepoint with its dispatch Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 07/13] drm/i915: Initialise min/max frequencies before updating RPS registers Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 08/13] drm/i915: Delay the relase of the forcewake by a jiffie Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 09/13] drm/i915: Add a tracepoint for using a semaphore Rodrigo Vivi
2013-09-25  9:34   ` Ville Syrjälä
2013-09-25 10:11     ` Daniel Vetter
2013-09-25 10:43       ` [PATCH] " Chris Wilson
2013-09-25 11:29         ` Ville Syrjälä
2013-09-25 11:31           ` Daniel Vetter
2013-09-23 20:33 ` [PATCH 10/13] drm/i915: Boost RPS frequency for CPU stalls Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 11/13] drm/i915: Tweak RPS thresholds to more aggressively downclock Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 12/13] drm/i915: Allow GT3 Slice Shutdown on Boot Rodrigo Vivi
2013-09-23 20:33 ` [PATCH 13/13] drm/i915: Allow Dynamically GT3 Slice Shutdown Rodrigo Vivi
2013-09-23 21:13   ` Chris Wilson
2013-09-24 10:15 ` [PATCH 00/13] drm-intel-collector - review request Daniel Vetter
2013-09-24 10:32   ` Chris Wilson
2013-09-24 10:47     ` Daniel Vetter
2013-09-24 11:11       ` Chris Wilson
2013-09-24 11:22         ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.