All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/3] RFC: force throttling/fairness
@ 2011-10-29  5:55 Ben Widawsky
  2011-10-29  5:55 ` [PATCH 1/3] drm/i915: Keep track of request counts Ben Widawsky
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29  5:55 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky, Daniel Vetter, Lecluse, Philippe

These patches are pretty raw as I'm hoping to get some comments before
working to hard too clean them up. The goal is GPU fairness for clients
running on i915.

These are simplified versions of some patches which I've been working on
for an internal customer. These patches are designed to keep clients
which are hogging GPU resources from making the system unresponsive.
They've been tested with an app Eric Anholt modified that never calls
swap buffer, and nexuiz in windowed mode. I've not played with the
thresholds much at all (only 2 attempts actually), but I was able to get
nexuiz up from 30fps to 46fps using this code. The app essentially
submits a bunch of GPU work, and then never calls swap.

I'm looking for comments... I think this is a good thing to upstream,
but I hate to add the debugfs interfaces without others agreeing this is
useful.

As an aside, I have hit one of my warnings (once) and have been trying
to track down the reason, so these patches are definitely not finished -
ie. feel free to bikeshed.

Ben Widawsky (3):
  drm/i915: Keep track of request counts
  drm/i915: Add waitq/wakeups for our per fd requests
  drm/i915: fairness

 drivers/gpu/drm/i915/i915_dma.c            |    1 +
 drivers/gpu/drm/i915/i915_drv.c            |   13 +++++++
 drivers/gpu/drm/i915/i915_drv.h            |    4 ++
 drivers/gpu/drm/i915/i915_gem.c            |   12 ++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   52 ++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_drv.h           |   13 ++++---
 6 files changed, 84 insertions(+), 11 deletions(-)

-- 
1.7.7.1

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH 1/3] drm/i915: Keep track of request counts
  2011-10-29  5:55 [PATCH 0/3] RFC: force throttling/fairness Ben Widawsky
@ 2011-10-29  5:55 ` Ben Widawsky
  2011-10-29  9:35   ` Daniel Vetter
  2011-10-29  5:55 ` [PATCH 2/3] drm/i915: Add waitq/wakeups for our per fd requests Ben Widawsky
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29  5:55 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

There is already a list of requests outstanding for a given client.
Keeping a count is easy, and will give some information necessary to
enable a more fair throttling scheme.

For now a client is uniquely identified by its file descriptor, however
this may change in the future with new process APIs.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.h |    1 +
 drivers/gpu/drm/i915/i915_gem.c |    8 ++++++++
 2 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 06a37f4..a251d22 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -919,6 +919,7 @@ struct drm_i915_file_private {
 	struct {
 		struct spinlock lock;
 		struct list_head request_list;
+		atomic_t outstanding_requests;
 	} mm;
 };
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6651c36..20d8a41 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1678,6 +1678,7 @@ i915_add_request(struct intel_ring_buffer *ring,
 		request->file_priv = file_priv;
 		list_add_tail(&request->client_list,
 			      &file_priv->mm.request_list);
+		atomic_inc(&file_priv->mm.outstanding_requests);
 		spin_unlock(&file_priv->mm.lock);
 	}
 
@@ -1706,6 +1707,10 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 
 	spin_lock(&file_priv->mm.lock);
 	if (request->file_priv) {
+		atomic_dec(&file_priv->mm.outstanding_requests);
+		if (WARN_ON(atomic_read(&file_priv->mm.outstanding_requests) < 0)) {
+			atomic_set(&file_priv->mm.outstanding_requests, 0);
+		}
 		list_del(&request->client_list);
 		request->file_priv = NULL;
 	}
@@ -4119,6 +4124,9 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 					   struct drm_i915_gem_request,
 					   client_list);
 		list_del(&request->client_list);
+		atomic_dec(&file_priv->mm.outstanding_requests);
+		if (WARN_ON(atomic_read(&file_priv->mm.outstanding_requests)))
+			atomic_set(&file_priv->mm.outstanding_requests, 0);
 		request->file_priv = NULL;
 	}
 	spin_unlock(&file_priv->mm.lock);
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] drm/i915: Add waitq/wakeups for our per fd requests
  2011-10-29  5:55 [PATCH 0/3] RFC: force throttling/fairness Ben Widawsky
  2011-10-29  5:55 ` [PATCH 1/3] drm/i915: Keep track of request counts Ben Widawsky
@ 2011-10-29  5:55 ` Ben Widawsky
  2011-10-29  5:55 ` [PATCH 3/3] drm/i915: fairness Ben Widawsky
  2011-10-29  8:07 ` [PATCH 0/3] RFC: force throttling/fairness Chris Wilson
  3 siblings, 0 replies; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29  5:55 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

With the previous patch we got counts of outstanding requests per i915
client. It's trivial now to add a waitqueue, and wakeup events for
arbitrary thresholds.

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_dma.c |    1 +
 drivers/gpu/drm/i915/i915_drv.c |    3 +++
 drivers/gpu/drm/i915/i915_drv.h |    3 +++
 drivers/gpu/drm/i915/i915_gem.c |    5 +++++
 4 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 2eac955..92be278 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -2204,6 +2204,7 @@ int i915_driver_open(struct drm_device *dev, struct drm_file *file)
 
 	spin_lock_init(&file_priv->mm.lock);
 	INIT_LIST_HEAD(&file_priv->mm.request_list);
+	init_waitqueue_head(&file_priv->mm.request_waitq);
 
 	return 0;
 }
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4c8d681..34289ab 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -102,6 +102,9 @@ MODULE_PARM_DESC(enable_hangcheck,
 		"WARNING: Disabling this can cause system wide hangs. "
 		"(default: true)");
 
+/* Default sched options to off */
+unsigned int i915_sched_hog_threshold __read_mostly = -1;
+unsigned int i915_sched_low_watermark __read_mostly = -1;
 static struct drm_driver driver;
 extern int intel_agp_enabled;
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a251d22..a063dc5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -920,6 +920,7 @@ struct drm_i915_file_private {
 		struct spinlock lock;
 		struct list_head request_list;
 		atomic_t outstanding_requests;
+		wait_queue_head_t request_waitq;
 	} mm;
 };
 
@@ -1005,6 +1006,8 @@ extern int i915_vbt_sdvo_panel_type __read_mostly;
 extern unsigned int i915_enable_rc6 __read_mostly;
 extern unsigned int i915_enable_fbc __read_mostly;
 extern bool i915_enable_hangcheck __read_mostly;
+extern unsigned int i915_sched_hog_threshold __read_mostly;
+extern unsigned int i915_sched_low_watermark __read_mostly;
 
 extern int i915_suspend(struct drm_device *dev, pm_message_t state);
 extern int i915_resume(struct drm_device *dev);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 20d8a41..26b4b81 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1711,6 +1711,10 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 		if (WARN_ON(atomic_read(&file_priv->mm.outstanding_requests) < 0)) {
 			atomic_set(&file_priv->mm.outstanding_requests, 0);
 		}
+		if (atomic_read(&file_priv->mm.outstanding_requests) <
+				 i915_sched_low_watermark) {
+			wake_up_all(&file_priv->mm.request_waitq);
+		}
 		list_del(&request->client_list);
 		request->file_priv = NULL;
 	}
@@ -4117,6 +4121,7 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 	 * file_priv.
 	 */
 	spin_lock(&file_priv->mm.lock);
+	wake_up_all(&file_priv->mm.request_waitq);
 	while (!list_empty(&file_priv->mm.request_list)) {
 		struct drm_i915_gem_request *request;
 
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] drm/i915: fairness
  2011-10-29  5:55 [PATCH 0/3] RFC: force throttling/fairness Ben Widawsky
  2011-10-29  5:55 ` [PATCH 1/3] drm/i915: Keep track of request counts Ben Widawsky
  2011-10-29  5:55 ` [PATCH 2/3] drm/i915: Add waitq/wakeups for our per fd requests Ben Widawsky
@ 2011-10-29  5:55 ` Ben Widawsky
  2011-10-29  8:07 ` [PATCH 0/3] RFC: force throttling/fairness Chris Wilson
  3 siblings, 0 replies; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29  5:55 UTC (permalink / raw)
  To: intel-gfx; +Cc: Ben Widawsky

We now have enough information to be able to block apps which are
getting greedy.

We don't have HW support for preempting batches, therefore the finest
granularity for which we can schedule is the batch buffer. We can block
clients at the time they submit if they've gone too many batches ahead .
The code does a very basic adaptive style synchronization since a few
tests were suggesting that the cost of putting the proc to sleep was too
expensive for smaller limits. Until we get some good values for the
various use cases, this will stay, but I can easily envision it being
gone in the final code (that's why the new wait APIs are in this patch).

Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
---
 drivers/gpu/drm/i915/i915_drv.c            |   16 +++++++--
 drivers/gpu/drm/i915/i915_gem.c            |    7 ++--
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   52 ++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_drv.h           |   13 ++++---
 4 files changed, 70 insertions(+), 18 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 34289ab..41e418f 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -102,9 +102,19 @@ MODULE_PARM_DESC(enable_hangcheck,
 		"WARNING: Disabling this can cause system wide hangs. "
 		"(default: true)");
 
-/* Default sched options to off */
-unsigned int i915_sched_hog_threshold __read_mostly = -1;
-unsigned int i915_sched_low_watermark __read_mostly = -1;
+unsigned int i915_sched_hog_threshold __read_mostly = 0xffffffff;
+module_param_named(hog_thresh, i915_sched_hog_threshold, int, 0600);
+MODULE_PARM_DESC(hog_thresh,
+		"Number of consecutive batches after which the driver "
+		"will block the DRM client. (default: 0xffffffff)");
+
+unsigned int i915_sched_low_watermark __read_mostly = 0xffffffff;
+module_param_named(low, i915_sched_low_watermark, int, 0600);
+MODULE_PARM_DESC(hog_thresh,
+		"Number of batches remaining before unblocking a DRM "
+		"client that had been blocked because of too many "
+		"consecutive submissions. (default: 0xffffffff)");
+
 static struct drm_driver driver;
 extern int intel_agp_enabled;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 26b4b81..5b7daa1 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4120,8 +4120,10 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 	 * later retire_requests won't dereference our soon-to-be-gone
 	 * file_priv.
 	 */
-	spin_lock(&file_priv->mm.lock);
 	wake_up_all(&file_priv->mm.request_waitq);
+	wait_for(&file_priv->mm.outstanding_requests == 0, 1000);
+
+	spin_lock(&file_priv->mm.lock);
 	while (!list_empty(&file_priv->mm.request_list)) {
 		struct drm_i915_gem_request *request;
 
@@ -4129,9 +4131,6 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 					   struct drm_i915_gem_request,
 					   client_list);
 		list_del(&request->client_list);
-		atomic_dec(&file_priv->mm.outstanding_requests);
-		if (WARN_ON(atomic_read(&file_priv->mm.outstanding_requests)))
-			atomic_set(&file_priv->mm.outstanding_requests, 0);
 		request->file_priv = NULL;
 	}
 	spin_unlock(&file_priv->mm.lock);
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3693e83..5f5a63a 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -960,8 +960,9 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct drm_i915_gem_exec_object2 *exec)
 {
 	drm_i915_private_t *dev_priv = dev->dev_private;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct list_head objects;
-	struct eb_objects *eb;
+	struct eb_objects *eb = NULL;
 	struct drm_i915_gem_object *batch_obj;
 	struct drm_clip_rect *cliprects = NULL;
 	struct intel_ring_buffer *ring;
@@ -1063,6 +1064,7 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		}
 	}
 
+again:
 	ret = i915_mutex_lock_interruptible(dev);
 	if (ret)
 		goto pre_mutex_err;
@@ -1073,15 +1075,53 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		goto pre_mutex_err;
 	}
 
-	eb = eb_create(args->buffer_count);
-	if (eb == NULL) {
-		mutex_unlock(&dev->struct_mutex);
-		ret = -ENOMEM;
-		goto pre_mutex_err;
+	if (!eb) {
+		eb = eb_create(args->buffer_count);
+		if (eb == NULL) {
+			mutex_unlock(&dev->struct_mutex);
+			ret = -ENOMEM;
+			goto pre_mutex_err;
+		}
 	}
 
 	/* Look up object handles */
 	INIT_LIST_HEAD(&objects);
+
+#define BLOCK_CONDITION ( \
+	atomic_read(&file_priv->mm.outstanding_requests) > i915_sched_hog_threshold)
+#define DONE_CONDITION ( \
+	atomic_read(&file_priv->mm.outstanding_requests) <= i915_sched_low_watermark)
+#define OUT_CONDITION (\
+	DONE_CONDITION || \
+	(atomic_read(&dev_priv->mm.wedged)) || \
+	(dev_priv->mm.suspended))
+
+	if (BLOCK_CONDITION) {
+		mutex_unlock(&dev->struct_mutex);
+		ret = wait_for_us(OUT_CONDITION, 200);
+		if (!ret && DONE_CONDITION) {
+			eb_reset(eb);
+			goto again;
+		} else if (ret != -ETIMEDOUT) {
+			eb_destroy(eb);
+			goto pre_mutex_err;
+		}
+
+		ret = wait_event_interruptible(file_priv->mm.request_waitq,
+					       OUT_CONDITION);
+		if (!ret && DONE_CONDITION) {
+			eb_reset(eb);
+			goto again;
+		} else {
+			eb_destroy(eb);
+			goto pre_mutex_err;
+		}
+	}
+
+#undef BLOCK_CONDITION
+#undef DONE_CONDITION
+#undef OUT_CONDITION
+
 	for (i = 0; i < args->buffer_count; i++) {
 		struct drm_i915_gem_object *obj;
 
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index bd9a604..6c5dad9 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -31,21 +31,24 @@
 #include "drm_crtc_helper.h"
 #include "drm_fb_helper.h"
 
-#define _wait_for(COND, MS, W) ({ \
-	unsigned long timeout__ = jiffies + msecs_to_jiffies(MS);	\
+#define _wait_for(COND, XS, W, gran, func) ({ \
+	unsigned long timeout__ = jiffies + gran##_to_jiffies(XS);	\
 	int ret__ = 0;							\
 	while (!(COND)) {						\
 		if (time_after(jiffies, timeout__)) {			\
 			ret__ = -ETIMEDOUT;				\
 			break;						\
 		}							\
-		if (W && !(in_atomic() || in_dbg_master())) msleep(W);	\
+		if (W && !(in_atomic() || in_dbg_master())) func(W);	\
 	}								\
 	ret__;								\
 })
 
-#define wait_for(COND, MS) _wait_for(COND, MS, 1)
-#define wait_for_atomic(COND, MS) _wait_for(COND, MS, 0)
+#define wait_for_us(COND, US) _wait_for(COND, US, 1, usecs, udelay)
+#define wait_for_ms(COND, MS) _wait_for(COND, MS, 1, msecs, msleep)
+#define wait_for_atomic_ms(COND, MS) _wait_for(COND, MS, 0, msecs, msleep)
+#define wait_for(COND, MS) wait_for_ms(COND, MS)
+#define wait_for_atomic(COND, MS) wait_for_atomic_ms(COND, MS)
 
 #define MSLEEP(x) do { \
 	if (in_dbg_master()) \
-- 
1.7.7.1

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] RFC: force throttling/fairness
  2011-10-29  5:55 [PATCH 0/3] RFC: force throttling/fairness Ben Widawsky
                   ` (2 preceding siblings ...)
  2011-10-29  5:55 ` [PATCH 3/3] drm/i915: fairness Ben Widawsky
@ 2011-10-29  8:07 ` Chris Wilson
  2011-10-29 18:45   ` Eric Anholt
  2011-10-29 19:27   ` Ben Widawsky
  3 siblings, 2 replies; 13+ messages in thread
From: Chris Wilson @ 2011-10-29  8:07 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Ben Widawsky, Lecluse, Philippe

On Fri, 28 Oct 2011 22:55:26 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> These patches are pretty raw as I'm hoping to get some comments before
> working to hard too clean them up. The goal is GPU fairness for clients
> running on i915.

The biggest danger I see is that num_outstanding is only decremented
in retire_requests, which under the right circumstances (a single hog or
a plurarity) will cause latencies of over 1s. Also this unnecessarily
penalises benchmarks, i.e. a single active client.

To start the bikeshedding, we need a debugfs to show the current clients
and their scheduling data.

I would like to see the hog values moved at least into the file_priv and
a gpuprio ioctl so that we can start prioritising by file. It would then
be possible for a compositing render server to open a fd for it owns
high priority composition and a fd per client and individual tune their
priorities. And before you know what's hit you someone will request
cgroups...

It would probably be best to overengineer the ioctl interface as a
facsimile of the more recent cpu schedulers.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] drm/i915: Keep track of request counts
  2011-10-29  5:55 ` [PATCH 1/3] drm/i915: Keep track of request counts Ben Widawsky
@ 2011-10-29  9:35   ` Daniel Vetter
  2011-10-29 19:30     ` Ben Widawsky
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2011-10-29  9:35 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: intel-gfx

On Fri, Oct 28, 2011 at 10:55:27PM -0700, Ben Widawsky wrote:
> There is already a list of requests outstanding for a given client.
> Keeping a count is easy, and will give some information necessary to
> enable a more fair throttling scheme.
> 
> For now a client is uniquely identified by its file descriptor, however
> this may change in the future with new process APIs.
> 
> Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |    1 +
>  drivers/gpu/drm/i915/i915_gem.c |    8 ++++++++
>  2 files changed, 9 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 06a37f4..a251d22 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -919,6 +919,7 @@ struct drm_i915_file_private {
>  	struct {
>  		struct spinlock lock;
>  		struct list_head request_list;
> +		atomic_t outstanding_requests;

Here's your bikeshed:

Is the spinlock not sufficient to protect the count? I'm asking because
atomic_ts are pretty hard to extend to more fancy scheme (e.g. taking
actual gpu time into account or comparing this with other processes
outstanding_request to make better decisions).
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] RFC: force throttling/fairness
  2011-10-29  8:07 ` [PATCH 0/3] RFC: force throttling/fairness Chris Wilson
@ 2011-10-29 18:45   ` Eric Anholt
  2011-10-29 19:22     ` Ben Widawsky
  2011-10-29 19:27   ` Ben Widawsky
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Anholt @ 2011-10-29 18:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Daniel Vetter, Ben Widawsky, Lecluse, Philippe


[-- Attachment #1.1: Type: text/plain, Size: 703 bytes --]

On Sat, 29 Oct 2011 09:07:35 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> On Fri, 28 Oct 2011 22:55:26 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > These patches are pretty raw as I'm hoping to get some comments before
> > working to hard too clean them up. The goal is GPU fairness for clients
> > running on i915.
> 
> The biggest danger I see is that num_outstanding is only decremented
> in retire_requests, which under the right circumstances (a single hog or
> a plurarity) will cause latencies of over 1s. Also this unnecessarily
> penalises benchmarks, i.e. a single active client.

Just using wait_request interface would require less code and not have
that issue.

[-- Attachment #1.2: Type: application/pgp-signature, Size: 197 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] RFC: force throttling/fairness
  2011-10-29 18:45   ` Eric Anholt
@ 2011-10-29 19:22     ` Ben Widawsky
  2011-10-29 19:40       ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29 19:22 UTC (permalink / raw)
  To: Eric Anholt; +Cc: Daniel Vetter, intel-gfx, Lecluse, Philippe

On Sat, 29 Oct 2011 11:45:34 -0700
Eric Anholt <eric@anholt.net> wrote:

> On Sat, 29 Oct 2011 09:07:35 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > On Fri, 28 Oct 2011 22:55:26 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > > These patches are pretty raw as I'm hoping to get some comments before
> > > working to hard too clean them up. The goal is GPU fairness for clients
> > > running on i915.
> > 
> > The biggest danger I see is that num_outstanding is only decremented
> > in retire_requests, which under the right circumstances (a single hog or
> > a plurarity) will cause latencies of over 1s. Also this unnecessarily
> > penalises benchmarks, i.e. a single active client.
> 
> Just using wait_request interface would require less code and not have
> that issue.

I must be missing something simple, but I didn't see an easy way to use
that interface to do what I want. Instead of outstanding requests, I'd
need to start tracking per fd seqnos (yes it's done elsewhere, but
didn't seem much different to me).

Can you explain what I'm missing?

Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] RFC: force throttling/fairness
  2011-10-29  8:07 ` [PATCH 0/3] RFC: force throttling/fairness Chris Wilson
  2011-10-29 18:45   ` Eric Anholt
@ 2011-10-29 19:27   ` Ben Widawsky
  1 sibling, 0 replies; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29 19:27 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Lecluse, Philippe

On Sat, 29 Oct 2011 09:07:35 +0100
Chris Wilson <chris@chris-wilson.co.uk> wrote:

> On Fri, 28 Oct 2011 22:55:26 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > These patches are pretty raw as I'm hoping to get some comments before
> > working to hard too clean them up. The goal is GPU fairness for clients
> > running on i915.
> 
> The biggest danger I see is that num_outstanding is only decremented
> in retire_requests, which under the right circumstances (a single hog or
> a plurarity) will cause latencies of over 1s. Also this unnecessarily
> penalises benchmarks, i.e. a single active client.

Well single active clients only exist for 2d, right? I was thinking I'd
disable the scheduler, and only start using it when the user forced it,
or the number of open drm fds is > 1.

> 
> To start the bikeshedding, we need a debugfs to show the current clients
> and their scheduling data.

Agreed.

> 
> I would like to see the hog values moved at least into the file_priv and
> a gpuprio ioctl so that we can start prioritising by file. It would then
> be possible for a compositing render server to open a fd for it owns
> high priority composition and a fd per client and individual tune their
> priorities. And before you know what's hit you someone will request
> cgroups...

The interface for the internal customer more or less does this. Even more
complicated is that in the long run, we probably desire more than fd
granularity (webGL for instance). I was trying to keep this as general as
possible though as it does seem to solve a problem and can be built upon
easily.

> 
> It would probably be best to overengineer the ioctl interface as a
> facsimile of the more recent cpu schedulers.

Could you give some advice as to what parameters you would like to see?

> -Chris
> 

Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] drm/i915: Keep track of request counts
  2011-10-29  9:35   ` Daniel Vetter
@ 2011-10-29 19:30     ` Ben Widawsky
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29 19:30 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Sat, 29 Oct 2011 11:35:13 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Fri, Oct 28, 2011 at 10:55:27PM -0700, Ben Widawsky wrote:
> > There is already a list of requests outstanding for a given client.
> > Keeping a count is easy, and will give some information necessary to
> > enable a more fair throttling scheme.
> > 
> > For now a client is uniquely identified by its file descriptor, however
> > this may change in the future with new process APIs.
> > 
> > Signed-off-by: Ben Widawsky <ben@bwidawsk.net>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h |    1 +
> >  drivers/gpu/drm/i915/i915_gem.c |    8 ++++++++
> >  2 files changed, 9 insertions(+), 0 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 06a37f4..a251d22 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -919,6 +919,7 @@ struct drm_i915_file_private {
> >  	struct {
> >  		struct spinlock lock;
> >  		struct list_head request_list;
> > +		atomic_t outstanding_requests;
> 
> Here's your bikeshed:
> 
> Is the spinlock not sufficient to protect the count? I'm asking because
> atomic_ts are pretty hard to extend to more fancy scheme (e.g. taking
> actual gpu time into account or comparing this with other processes
> outstanding_request to make better decisions).
> -Daniel

In a previous version, I set the oustanding count to 0 without holding the lock
in i915_gem_release. In what I sent to the list, it seems the atomic isn't
actually needed.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] RFC: force throttling/fairness
  2011-10-29 19:22     ` Ben Widawsky
@ 2011-10-29 19:40       ` Daniel Vetter
  2011-10-29 19:47         ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2011-10-29 19:40 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx, Lecluse, Philippe

On Sat, Oct 29, 2011 at 12:22:25PM -0700, Ben Widawsky wrote:
> On Sat, 29 Oct 2011 11:45:34 -0700
> Eric Anholt <eric@anholt.net> wrote:
> 
> > On Sat, 29 Oct 2011 09:07:35 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > On Fri, 28 Oct 2011 22:55:26 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > > > These patches are pretty raw as I'm hoping to get some comments before
> > > > working to hard too clean them up. The goal is GPU fairness for clients
> > > > running on i915.
> > > 
> > > The biggest danger I see is that num_outstanding is only decremented
> > > in retire_requests, which under the right circumstances (a single hog or
> > > a plurarity) will cause latencies of over 1s. Also this unnecessarily
> > > penalises benchmarks, i.e. a single active client.
> > 
> > Just using wait_request interface would require less code and not have
> > that issue.
> 
> I must be missing something simple, but I didn't see an easy way to use
> that interface to do what I want. Instead of outstanding requests, I'd
> need to start tracking per fd seqnos (yes it's done elsewhere, but
> didn't seem much different to me).

We track request per-fd already in the file_priv, so if the oustanding
requests is already at the limit and would go over the top with an
additional request, you can just look at the previously submitted request
for this fd, grab it's seqno and wait for that. The ugly part then just
boils down to allowing wait_request without the struct_mutex held. But
that's rather generally usefull infrastructure.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] RFC: force throttling/fairness
  2011-10-29 19:40       ` Daniel Vetter
@ 2011-10-29 19:47         ` Daniel Vetter
  2011-10-29 19:49           ` Ben Widawsky
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2011-10-29 19:47 UTC (permalink / raw)
  To: Ben Widawsky; +Cc: Daniel Vetter, intel-gfx, Lecluse, Philippe

On Sat, Oct 29, 2011 at 09:40:57PM +0200, Daniel Vetter wrote:
> On Sat, Oct 29, 2011 at 12:22:25PM -0700, Ben Widawsky wrote:
> > On Sat, 29 Oct 2011 11:45:34 -0700
> > Eric Anholt <eric@anholt.net> wrote:
> > 
> > > On Sat, 29 Oct 2011 09:07:35 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > On Fri, 28 Oct 2011 22:55:26 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > > > > These patches are pretty raw as I'm hoping to get some comments before
> > > > > working to hard too clean them up. The goal is GPU fairness for clients
> > > > > running on i915.
> > > > 
> > > > The biggest danger I see is that num_outstanding is only decremented
> > > > in retire_requests, which under the right circumstances (a single hog or
> > > > a plurarity) will cause latencies of over 1s. Also this unnecessarily
> > > > penalises benchmarks, i.e. a single active client.
> > > 
> > > Just using wait_request interface would require less code and not have
> > > that issue.
> > 
> > I must be missing something simple, but I didn't see an easy way to use
> > that interface to do what I want. Instead of outstanding requests, I'd
> > need to start tracking per fd seqnos (yes it's done elsewhere, but
> > didn't seem much different to me).
> 
> We track request per-fd already in the file_priv, so if the oustanding
> requests is already at the limit and would go over the top with an
> additional request, you can just look at the previously submitted request
> for this fd, grab it's seqno and wait for that. The ugly part then just
> boils down to allowing wait_request without the struct_mutex held. But
> that's rather generally usefull infrastructure.

Meh, not wait on the previously submitted req, but the first one to retire
next, obviously.
-Daniel
-- 
Daniel Vetter
Mail: daniel@ffwll.ch
Mobile: +41 (0)79 365 57 48

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 0/3] RFC: force throttling/fairness
  2011-10-29 19:47         ` Daniel Vetter
@ 2011-10-29 19:49           ` Ben Widawsky
  0 siblings, 0 replies; 13+ messages in thread
From: Ben Widawsky @ 2011-10-29 19:49 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Daniel Vetter, intel-gfx, Lecluse, Philippe

On Sat, 29 Oct 2011 21:47:04 +0200
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Sat, Oct 29, 2011 at 09:40:57PM +0200, Daniel Vetter wrote:
> > On Sat, Oct 29, 2011 at 12:22:25PM -0700, Ben Widawsky wrote:
> > > On Sat, 29 Oct 2011 11:45:34 -0700
> > > Eric Anholt <eric@anholt.net> wrote:
> > > 
> > > > On Sat, 29 Oct 2011 09:07:35 +0100, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > > On Fri, 28 Oct 2011 22:55:26 -0700, Ben Widawsky <ben@bwidawsk.net> wrote:
> > > > > > These patches are pretty raw as I'm hoping to get some comments before
> > > > > > working to hard too clean them up. The goal is GPU fairness for clients
> > > > > > running on i915.
> > > > > 
> > > > > The biggest danger I see is that num_outstanding is only decremented
> > > > > in retire_requests, which under the right circumstances (a single hog or
> > > > > a plurarity) will cause latencies of over 1s. Also this unnecessarily
> > > > > penalises benchmarks, i.e. a single active client.
> > > > 
> > > > Just using wait_request interface would require less code and not have
> > > > that issue.
> > > 
> > > I must be missing something simple, but I didn't see an easy way to use
> > > that interface to do what I want. Instead of outstanding requests, I'd
> > > need to start tracking per fd seqnos (yes it's done elsewhere, but
> > > didn't seem much different to me).
> > 
> > We track request per-fd already in the file_priv, so if the oustanding
> > requests is already at the limit and would go over the top with an
> > additional request, you can just look at the previously submitted request
> > for this fd, grab it's seqno and wait for that. The ugly part then just
> > boils down to allowing wait_request without the struct_mutex held. But
> > that's rather generally usefull infrastructure.
> 
> Meh, not wait on the previously submitted req, but the first one to retire
> next, obviously.
> -Daniel

Yep, I think I got it. Unlocked wait, keep the count, retire at execbuf (if
passed threshold), and that should be it.

Ben

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2011-10-29 19:49 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-10-29  5:55 [PATCH 0/3] RFC: force throttling/fairness Ben Widawsky
2011-10-29  5:55 ` [PATCH 1/3] drm/i915: Keep track of request counts Ben Widawsky
2011-10-29  9:35   ` Daniel Vetter
2011-10-29 19:30     ` Ben Widawsky
2011-10-29  5:55 ` [PATCH 2/3] drm/i915: Add waitq/wakeups for our per fd requests Ben Widawsky
2011-10-29  5:55 ` [PATCH 3/3] drm/i915: fairness Ben Widawsky
2011-10-29  8:07 ` [PATCH 0/3] RFC: force throttling/fairness Chris Wilson
2011-10-29 18:45   ` Eric Anholt
2011-10-29 19:22     ` Ben Widawsky
2011-10-29 19:40       ` Daniel Vetter
2011-10-29 19:47         ` Daniel Vetter
2011-10-29 19:49           ` Ben Widawsky
2011-10-29 19:27   ` Ben Widawsky

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.