All of lore.kernel.org
 help / color / mirror / Atom feed
* r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour
@ 2012-01-31 13:16 Simon Farnsworth
  2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Simon Farnsworth @ 2012-01-31 13:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3738 bytes --]

Hello,

When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
r600_fence_finish was taking 10% of my CPU time. I determined experimentally
that changing from sched_yield() to os_time_sleep(10) fixed this and
resolved my last performance issue on AMD Fusion as compared to Intel Atom,
but felt that this was hacky.

I've therefore tried to use INT_SEL of 0b10 in the EVENT_WRITE_EOP in Mesa,
combined with a new ioctl to wait for a changed value, but it's not working
the way I would expect. I'll be sending patches as replies to this message,
so that you can see exactly what I've done, but in brief, I have an ioctl
that uses wait_event to wait for a chosen offset in a BO to change
value. I've added a suitable waitqueue, and made radeon_fence_process call
wake_up_all.

I'm seeing behaviour from this that I can't explain; as you'll see in the
patches, I've moved some IRQ prints from DRM_DEBUG to printk(KERN_INFO), and
I'm seeing that I don't get the EOP interrupt in a timely fashion - either
because memory is not as coherent between the GPU and CPU as I would like
(so I'm reading stale data when I call wait_event), or because the interrupt
is genuinely delayed.

From dmesg (with commentary):

X11 and my GL compositor start.

[   84.423567] IH: CP EOP
[   84.423600] Woke kernel fences
[   84.423606] Waking up all waiters

This looks like an EOP for a kernel-side fence.

[   84.651320] wait_user_fence offset 4 value 0 timeout -1
[   84.651332] Current data value 0

This is my compositor coming in via my new ioctl, to wait for EOP.

I get bored of waiting for the ioctl to complete, and send the compositor
SIGSTOP then SIGCONT.

[  149.970629] wait_user_fence offset 4 value 0 timeout -1
[  149.970635] Finished data value 1

My new ioctl completes, as the data has changed value. I was expecting an
EOP interrupt before this, which hasn't arrived - why?

[  150.224675] wait_user_fence offset 8 value 0 timeout -1
[  150.224692] Current data value 0

The compositor comes in again, waiting on a different fence.

[  150.250166] IH: CP EOP
[  150.250194] Woke kernel fences
[  150.250198] Waking up all waiters

This looks like an EOP for a kernel-side fence.

[  150.250212] IH: CP EOP
[  150.250216] Waking up all waiters

This looks like an EOP for the fence that completed at time 149.970 - why's
it been delayed?

[  150.250219] IH: CP EOP
[  150.250221] Waking up all waiters

And another EOP that I can't tie up to command buffers.

[  150.250327] IH: CP EOP
[  150.250335] Woke kernel fences
[  150.250337] Waking up all waiters

Kernel fence.

[  150.250559] IH: CP EOP
[  150.250567] Woke kernel fences
[  150.250570] Waking up all waiters

Another kernel fence.

[  150.250581] IH: CP EOP
[  150.250583] Waking up all waiters
[  150.250595] wait_user_fence offset 8 value 0 timeout -1
[  150.250604] IH: CP EOP
[  150.250608] Waking up all waiters
[  150.250615] Finished data value 1

Two user fence EOPs in a row, one of which woke up my process.

[  150.251462] IH: CP EOP
[  150.251477] Woke kernel fences
[  150.251480] Waking up all waiters

Kernel fence.

[  150.256806] wait_user_fence offset 0 value 0 timeout -1
[  150.256828] Current data value 0

Stalled again, waiting for EOP interrupt. Could be because the GPU and CPU
have different views of memory at this point.

There will be two patches in reply to this mail - one is the Mesa patch, one
the kernel patch; I would greatly appreciate help getting this going.
--
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] [r600g] Use new kernel interface to wait for fences
  2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
@ 2012-01-31 13:17 ` Simon Farnsworth
  2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
  2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
  2 siblings, 0 replies; 6+ messages in thread
From: Simon Farnsworth @ 2012-01-31 13:17 UTC (permalink / raw)
  To: dri-devel

Instead of busywaiting for the GPU to finish a fence, use the new kernel
interface to wait for fence completion.

This code needs completion - in particular, we should fall back to
busywaiting (using the nokernel function that's in radeon_drm_bo.c) if the
kernel doesn't support the new interface.

Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
---
 src/gallium/drivers/r600/r600_hw_context.c    |    2 +-
 src/gallium/drivers/r600/r600_pipe.c          |   12 +++------
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   30 +++++++++++++++++++++++++
 src/gallium/winsys/radeon/drm/radeon_winsys.h |   16 +++++++++++++
 4 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..35a57a7 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1618,7 +1618,7 @@ void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fen
 	ctx->pm4[ctx->pm4_cdwords++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5);
 	ctx->pm4[ctx->pm4_cdwords++] = va & 0xFFFFFFFFUL;       /* ADDRESS_LO */
 	/* DATA_SEL | INT_EN | ADDRESS_HI */
-	ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (0 << 24) | ((va >> 32UL) & 0xFF);
+	ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (2 << 24) | ((va >> 32UL) & 0xFF);
 	ctx->pm4[ctx->pm4_cdwords++] = value;                   /* DATA_LO */
 	ctx->pm4[ctx->pm4_cdwords++] = 0;                       /* DATA_HI */
 	ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..12c5bf5 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -595,7 +595,6 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen,
 	struct r600_screen *rscreen = (struct r600_screen *)pscreen;
 	struct r600_fence *rfence = (struct r600_fence*)fence;
 	int64_t start_time = 0;
-	unsigned spins = 0;
 
 	if (timeout != PIPE_TIMEOUT_INFINITE) {
 		start_time = os_time_get();
@@ -605,13 +604,10 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen,
 	}
 
 	while (rscreen->fences.data[rfence->index] == 0) {
-		if (++spins % 256)
-			continue;
-#ifdef PIPE_OS_UNIX
-		sched_yield();
-#else
-		os_time_sleep(10);
-#endif
+		rscreen->ws->buffer_wait_fence(rscreen->fences.bo->buf,
+					       rfence->index << 2,
+					       0,
+					       timeout);
 		if (timeout != PIPE_TIMEOUT_INFINITE &&
 		    os_time_get() - start_time >= timeout) {
 			return FALSE;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index 143dcf9..b552c11 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -879,6 +879,35 @@ static uint64_t radeon_winsys_bo_va(struct pb_buffer *buffer)
     return bo->va;
 }
 
+/* No kernel support for doing this faster - just spin */
+static void radeon_winsys_bo_wait_fence_nokernel(struct pb_buffer *buf,
+						 unsigned offset,
+						 uint32_t value,
+						 uint64_t timeout)
+{
+#ifdef PIPE_OS_UNIX
+    sched_yield();
+#else
+    os_time_sleep(10);
+#endif
+}
+
+static void radeon_winsys_bo_wait_fence(struct pb_buffer *_buf,
+					unsigned offset,
+					uint32_t value,
+					uint64_t timeout)
+{
+    struct radeon_bo *bo = get_radeon_bo(_buf);
+    struct drm_radeon_gem_wait_user_fence args;
+    memset(&args, 0, sizeof(args));
+    args.handle = bo->handle;
+    args.offset = offset;
+    args.value = value;
+    args.timeout_usec = timeout;
+    while (drmCommandWrite(bo->rws->fd, DRM_RADEON_GEM_WAIT_USER_FENCE,
+                               &args, sizeof(args)) == -EBUSY);
+}
+
 void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws)
 {
     ws->base.buffer_get_cs_handle = radeon_drm_get_cs_handle;
@@ -892,4 +921,5 @@ void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws)
     ws->base.buffer_from_handle = radeon_winsys_bo_from_handle;
     ws->base.buffer_get_handle = radeon_winsys_bo_get_handle;
     ws->base.buffer_get_virtual_address = radeon_winsys_bo_va;
+    ws->base.buffer_wait_fence = radeon_winsys_bo_wait_fence;
 }
diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h b/src/gallium/winsys/radeon/drm/radeon_winsys.h
index e462e86..869961f 100644
--- a/src/gallium/winsys/radeon/drm/radeon_winsys.h
+++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h
@@ -264,6 +264,22 @@ struct radeon_winsys {
      */
     uint64_t (*buffer_get_virtual_address)(struct pb_buffer *buf);
 
+    /**
+     * Wait until a fence (EVENT_WRITE_EOP typically) has had a chance to
+     * write to a buffer. NB: there is no guarantee that the GPU has written
+     * to the buffer when this call returns, merely that it has had an
+     * opportunity to do so.
+     *
+     * \param buf       A winsys buffer object
+     * \param offset    Offset in bytes within the buffer that you expect to see changed - must be uint32_t aligned
+     * \param value     The current value stored at offset
+     * \param timeout   The maximum wait time, in microseconds
+     */
+    void (*buffer_wait_fence)(struct pb_buffer *buf,
+                              unsigned offset,
+                              uint32_t value,
+                              uint64_t timeout);
+
     /**************************************************************************
      * Command submission.
      *
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
  2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
@ 2012-01-31 13:18 ` Simon Farnsworth
  2012-01-31 14:49   ` Simon Farnsworth
  2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
  2 siblings, 1 reply; 6+ messages in thread
From: Simon Farnsworth @ 2012-01-31 13:18 UTC (permalink / raw)
  To: dri-devel

Userspace currently busywaits for fences to complete; on my workload, this
busywait consumes 10% of the available CPU time.

Provide an ioctl so that userspace can wait for an EOP interrupt that
corresponds to a previous EVENT_WRITE_EOP.

This currently doesn't work, hence the debug code piled in.

Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
---
 drivers/gpu/drm/radeon/evergreen.c     |    8 ++--
 drivers/gpu/drm/radeon/radeon.h        |    3 +
 drivers/gpu/drm/radeon/radeon_device.c |    1 +
 drivers/gpu/drm/radeon/radeon_fence.c  |    3 +
 drivers/gpu/drm/radeon/radeon_gem.c    |   70 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_kms.c    |    1 +
 include/drm/radeon_drm.h               |   28 +++++++++++++
 7 files changed, 110 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
index 0c5dd78..5b886b0 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3083,11 +3083,11 @@ restart_ih:
 		case 176: /* CP_INT in ring buffer */
 		case 177: /* CP_INT in IB1 */
 		case 178: /* CP_INT in IB2 */
-			DRM_DEBUG("IH: CP int: 0x%08x\n", src_data);
+			printk(KERN_INFO "IH: CP int: 0x%08x\n", src_data);
 			radeon_fence_process(rdev, RADEON_RING_TYPE_GFX_INDEX);
 			break;
 		case 181: /* CP EOP event */
-			DRM_DEBUG("IH: CP EOP\n");
+			printk(KERN_INFO "IH: CP EOP\n");
 			if (rdev->family >= CHIP_CAYMAN) {
 				switch (src_data) {
 				case 0:
@@ -3104,12 +3104,12 @@ restart_ih:
 				radeon_fence_process(rdev, RADEON_RING_TYPE_GFX_INDEX);
 			break;
 		case 233: /* GUI IDLE */
-			DRM_DEBUG("IH: GUI idle\n");
+			printk(KERN_INFO "IH: GUI idle\n");
 			rdev->pm.gui_idle = true;
 			wake_up(&rdev->irq.idle_queue);
 			break;
 		default:
-			DRM_DEBUG("Unhandled interrupt: %d %d\n", src_id, src_data);
+			printk(KERN_INFO "Unhandled interrupt: %d %d\n", src_id, src_data);
 			break;
 		}
 
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 2859406..fb0eafd 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -1348,6 +1348,8 @@ int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
 int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+				     struct drm_file *filp);
 
 /* VRAM scratch page for HDP bug, default vram page */
 struct r600_vram_scratch {
@@ -1444,6 +1446,7 @@ struct radeon_device {
 	struct radeon_mman		mman;
 	rwlock_t			fence_lock;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
+	wait_queue_head_t		userspace_fence_wait_queue;
 	struct radeon_semaphore_driver	semaphore_drv;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	struct radeon_ib_pool		ib_pool;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0afb13b..dcf11e5 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -730,6 +730,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	mutex_init(&rdev->pm.mutex);
 	mutex_init(&rdev->vram_mutex);
 	rwlock_init(&rdev->fence_lock);
+	init_waitqueue_head(&rdev->userspace_fence_wait_queue);
 	rwlock_init(&rdev->semaphore_drv.lock);
 	INIT_LIST_HEAD(&rdev->gem.objects);
 	init_waitqueue_head(&rdev->irq.vblank_queue);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 64ea3dd..5b8270f 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -355,7 +355,10 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 	if (wake) {
 		wake_up_all(&rdev->fence_drv[ring].queue);
+		printk( KERN_INFO "Woke kernel fences\n" );
 	}
+	printk( KERN_INFO "Waking up all waiters\n" );
+	wake_up_interruptible_all(&rdev->userspace_fence_wait_queue);
 }
 
 int radeon_fence_count_emitted(struct radeon_device *rdev, int ring)
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index 7337850..6866f75 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -531,3 +531,73 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 {
 	return drm_gem_handle_delete(file_priv, handle);
 }
+
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *filp)
+{
+	struct drm_radeon_gem_wait_user_fence *args = data;
+	struct radeon_device *rdev = dev->dev_private;
+	struct drm_gem_object *gobj;
+	struct radeon_bo *robj;
+	void *buffer_data;
+	uint32_t *fence_data;
+	int r = 0;
+	long timeout;
+
+	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
+
+	gobj = drm_gem_object_lookup(dev, filp, args->handle);
+	if (gobj == NULL) {
+		return -ENOENT;
+	}
+	robj = gem_to_radeon_bo(gobj);
+
+	if (gobj->size < args->offset) {
+		printk( KERN_INFO "Offset too large\n" );
+		r = -EINVAL;
+		goto unreference;
+	}
+
+	r = radeon_bo_reserve(robj, true);
+	if (r) {
+		printk( KERN_INFO "Reserve fail\n" );
+		goto unreference;
+	}
+
+	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
+	if (r) {
+		printk( KERN_INFO "Pin fail\n" );
+		goto unreserve;
+	}
+
+	r = radeon_bo_kmap(robj, &buffer_data);
+	if (r) {
+		printk( KERN_INFO "kmap fail\n" );
+		goto unpin;
+	}
+
+	fence_data = (uint32_t*)buffer_data;
+
+	printk( KERN_INFO "Current data value %d\n", fence_data[args->offset >> 2] );
+
+	timeout = wait_event_interruptible_timeout(rdev->userspace_fence_wait_queue,
+						   fence_data[args->offset >> 2] != args->value,
+						   usecs_to_jiffies(args->timeout_usec));
+	if (timeout == 0)
+		r = -ETIMEDOUT;
+	else if (timeout < 0)
+		r = timeout;
+
+	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
+	printk( KERN_INFO "Finished data value %d\n", fence_data[args->offset >> 2] );
+
+	radeon_bo_kunmap(robj);
+unpin:
+	radeon_bo_unpin(robj);
+unreserve:
+	radeon_bo_unreserve(robj);
+unreference:
+	drm_gem_object_unreference_unlocked(gobj);
+
+	return r;
+}
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index d335288..0e552cc 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -496,5 +496,6 @@ struct drm_ioctl_desc radeon_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(RADEON_GEM_WAIT_USER_FENCE, radeon_gem_wait_user_fence_ioctl, DRM_AUTH|DRM_UNLOCKED),
 };
 int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h
index dd2e9cf..c261c8c 100644
--- a/include/drm/radeon_drm.h
+++ b/include/drm/radeon_drm.h
@@ -510,6 +510,7 @@ typedef struct {
 #define DRM_RADEON_GEM_GET_TILING	0x29
 #define DRM_RADEON_GEM_BUSY		0x2a
 #define DRM_RADEON_GEM_VA		0x2b
+#define DRM_RADEON_GEM_WAIT_USER_FENCE  0x2c
 
 #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
 #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
@@ -552,6 +553,7 @@ typedef struct {
 #define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
 #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
 #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
+#define DRM_IOCTL_RADEON_GEM_WAIT_USER_FENCE   DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_USER_FENCE, struct drm_radeon_gem_wait_user_fence)
 
 typedef struct drm_radeon_init {
 	enum {
@@ -967,4 +969,30 @@ struct drm_radeon_info {
 	uint64_t		value;
 };
 
+/**
+ * struct drm_radeon_gem_wait_user_fence - DRM_RADEON_GEM_WAIT_USER_FENCE ioctl param
+ *
+ * @handle: Handle for the object that the GPU is expected to write
+ * @offset: Offset (in bytes) within that object where the GPU is expected
+ *          to write. Must be DWORD-aligned
+ * @value: The value expected if the GPU has not yet written to this location
+ * @timeout_usec: The maximum time to wait for the GPU, in microseconds
+ *
+ * The DRM_RADEON_GEM_WAIT_USER_FENCE ioctl is meant to allow userspace to
+ * avoid busy-waiting for a EVENT_WRITE_EOP packet to complete (e.g. for
+ * fence sync objects in OpenGL). It expects the EVENT_WRITE_EOP packet to
+ * have requested an interrupt on completion.
+ *
+ * The ioctl will return immediately if the value supplied is not the value
+ * found in the buffer at offset bytes in; otherwise, it will sleep for up
+ * to timeout_usec, waking up when an EVENT_WRITE_EOP packet causes an
+ * interrupt and the value in the buffer might have changed.
+ */
+struct drm_radeon_gem_wait_user_fence {
+	uint32_t                handle;
+	uint64_t                offset;
+	uint32_t                value;
+	uint64_t                timeout_usec;
+};
+
 #endif
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
@ 2012-01-31 14:49   ` Simon Farnsworth
  0 siblings, 0 replies; 6+ messages in thread
From: Simon Farnsworth @ 2012-01-31 14:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: Text/Plain, Size: 2675 bytes --]

Alex Deucher pointed out an error on IRC; I'm not using
radeon_irq_kms_sw_irq_get and put to manage the IRQ enablement.

I've fixed this up (as per the partial hunk below), and my bug goes away. I
will be cleaning these patches up for proper submission.

Simon

On Tuesday 31 January 2012, Simon Farnsworth <simon.farnsworth@onelan.co.uk> wrote:
<snip>
> +int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *filp)
> +{
> +	struct drm_radeon_gem_wait_user_fence *args = data;
> +	struct radeon_device *rdev = dev->dev_private;
> +	struct drm_gem_object *gobj;
> +	struct radeon_bo *robj;
> +	void *buffer_data;
> +	uint32_t *fence_data;
> +	int r = 0;
> +	long timeout;
> +
> +	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
> +
> +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> +	if (gobj == NULL) {
> +		return -ENOENT;
> +	}
> +	robj = gem_to_radeon_bo(gobj);
> +
> +	if (gobj->size < args->offset) {
> +		printk( KERN_INFO "Offset too large\n" );
> +		r = -EINVAL;
> +		goto unreference;
> +	}
> +
> +	r = radeon_bo_reserve(robj, true);
> +	if (r) {
> +		printk( KERN_INFO "Reserve fail\n" );
> +		goto unreference;
> +	}
> +
> +	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
> +	if (r) {
> +		printk( KERN_INFO "Pin fail\n" );
> +		goto unreserve;
> +	}
> +
> +	r = radeon_bo_kmap(robj, &buffer_data);
> +	if (r) {
> +		printk( KERN_INFO "kmap fail\n" );
> +		goto unpin;
> +	}
> +
Missing radeon_irq_kms_sw_irq_get(rdev, RADEON_RING_TYPE_GFX_INDEX); here

> +	fence_data = (uint32_t*)buffer_data;
> +
> +	printk( KERN_INFO "Current data value %d\n", fence_data[args->offset >> 2] );
> +
> +	timeout = wait_event_interruptible_timeout(rdev->userspace_fence_wait_queue,
> +						   fence_data[args->offset >> 2] != args->value,
> +						   usecs_to_jiffies(args->timeout_usec));

And missing radeon_irq_kms_sw_irq_put(rdev, RADEON_RING_TYPE_GFX_INDEX); here.

> +	if (timeout == 0)
> +		r = -ETIMEDOUT;
> +	else if (timeout < 0)
> +		r = timeout;
> +
> +	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
> +	printk( KERN_INFO "Finished data value %d\n", fence_data[args->offset >> 2] );
> +
> +	radeon_bo_kunmap(robj);
> +unpin:
> +	radeon_bo_unpin(robj);
> +unreserve:
> +	radeon_bo_unreserve(robj);
> +unreference:
> +	drm_gem_object_unreference_unlocked(gobj);
> +
> +	return r;
> +}

-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour
  2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
  2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
  2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
@ 2012-01-31 15:38 ` Alan Swanson
  2012-01-31 15:54   ` Simon Farnsworth
  2 siblings, 1 reply; 6+ messages in thread
From: Alan Swanson @ 2012-01-31 15:38 UTC (permalink / raw)
  To: Simon Farnsworth; +Cc: dri-devel

On Tue, 2012-01-31 at 13:16 +0000, Simon Farnsworth wrote:
> Hello,
> 
> When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
> wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
> r600_fence_finish was taking 10% of my CPU time. I determined experimentally
> that changing from sched_yield() to os_time_sleep(10) fixed this and
> resolved my last performance issue on AMD Fusion as compared to Intel Atom,
> but felt that this was hacky.

No, you were right the first time, sched_yield should definitely not be
being used in this busy wait under Linux (even with its preceding few
spins).

Back in 2003 when Linux 2.5 changed sched_yield to move processes from
run queue to the expired queue instead, its use was discussed on the DRI
devel list.

http://lwn.net/Articles/31462/
http://thread.gmane.org/gmane.comp.video.dri.devel/5455/focus=6779

There stills seems to be five uses that should be checked.

src/gallium/drivers/r600/r600_pipe.c
src/gallium/drivers/nouveau/nouveau_fence.c
src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c
src/gallium/winsys/radeon/drm/radeon_drm_bo.c (*2)

-- 
Alan.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour
  2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
@ 2012-01-31 15:54   ` Simon Farnsworth
  0 siblings, 0 replies; 6+ messages in thread
From: Simon Farnsworth @ 2012-01-31 15:54 UTC (permalink / raw)
  To: Alan Swanson; +Cc: dri-devel


[-- Attachment #1.1: Type: Text/Plain, Size: 1227 bytes --]

On Tuesday 31 January 2012, Alan Swanson <swanson@ukfsn.org> wrote:
> On Tue, 2012-01-31 at 13:16 +0000, Simon Farnsworth wrote:
> > Hello,
> > 
> > When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
> > wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
> > r600_fence_finish was taking 10% of my CPU time. I determined experimentally
> > that changing from sched_yield() to os_time_sleep(10) fixed this and
> > resolved my last performance issue on AMD Fusion as compared to Intel Atom,
> > but felt that this was hacky.
> 
> No, you were right the first time, sched_yield should definitely not be
> being used in this busy wait under Linux (even with its preceding few
> spins).
>
Given that I can get the hardware to raise an interrupt when the fence
finishes, I think the sleep is hacky - I should not be spinning under normal
circumstances.

I'm preparing patches that remove the sleep completely if you have new
enough kernel, libdrm and Mesa. If someone else prepares patches to remove
the sched_yield usage completely, I'll happily test them on my Radeon hardware.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-01-31 16:10 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
2012-01-31 14:49   ` Simon Farnsworth
2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
2012-01-31 15:54   ` Simon Farnsworth

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.