All of lore.kernel.org
 help / color / mirror / Atom feed
* r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour
@ 2012-01-31 13:16 Simon Farnsworth
  2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
                   ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Simon Farnsworth @ 2012-01-31 13:16 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3738 bytes --]

Hello,

When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
r600_fence_finish was taking 10% of my CPU time. I determined experimentally
that changing from sched_yield() to os_time_sleep(10) fixed this and
resolved my last performance issue on AMD Fusion as compared to Intel Atom,
but felt that this was hacky.

I've therefore tried to use INT_SEL of 0b10 in the EVENT_WRITE_EOP in Mesa,
combined with a new ioctl to wait for a changed value, but it's not working
the way I would expect. I'll be sending patches as replies to this message,
so that you can see exactly what I've done, but in brief, I have an ioctl
that uses wait_event to wait for a chosen offset in a BO to change
value. I've added a suitable waitqueue, and made radeon_fence_process call
wake_up_all.

I'm seeing behaviour from this that I can't explain; as you'll see in the
patches, I've moved some IRQ prints from DRM_DEBUG to printk(KERN_INFO), and
I'm seeing that I don't get the EOP interrupt in a timely fashion - either
because memory is not as coherent between the GPU and CPU as I would like
(so I'm reading stale data when I call wait_event), or because the interrupt
is genuinely delayed.

From dmesg (with commentary):

X11 and my GL compositor start.

[   84.423567] IH: CP EOP
[   84.423600] Woke kernel fences
[   84.423606] Waking up all waiters

This looks like an EOP for a kernel-side fence.

[   84.651320] wait_user_fence offset 4 value 0 timeout -1
[   84.651332] Current data value 0

This is my compositor coming in via my new ioctl, to wait for EOP.

I get bored of waiting for the ioctl to complete, and send the compositor
SIGSTOP then SIGCONT.

[  149.970629] wait_user_fence offset 4 value 0 timeout -1
[  149.970635] Finished data value 1

My new ioctl completes, as the data has changed value. I was expecting an
EOP interrupt before this, which hasn't arrived - why?

[  150.224675] wait_user_fence offset 8 value 0 timeout -1
[  150.224692] Current data value 0

The compositor comes in again, waiting on a different fence.

[  150.250166] IH: CP EOP
[  150.250194] Woke kernel fences
[  150.250198] Waking up all waiters

This looks like an EOP for a kernel-side fence.

[  150.250212] IH: CP EOP
[  150.250216] Waking up all waiters

This looks like an EOP for the fence that completed at time 149.970 - why's
it been delayed?

[  150.250219] IH: CP EOP
[  150.250221] Waking up all waiters

And another EOP that I can't tie up to command buffers.

[  150.250327] IH: CP EOP
[  150.250335] Woke kernel fences
[  150.250337] Waking up all waiters

Kernel fence.

[  150.250559] IH: CP EOP
[  150.250567] Woke kernel fences
[  150.250570] Waking up all waiters

Another kernel fence.

[  150.250581] IH: CP EOP
[  150.250583] Waking up all waiters
[  150.250595] wait_user_fence offset 8 value 0 timeout -1
[  150.250604] IH: CP EOP
[  150.250608] Waking up all waiters
[  150.250615] Finished data value 1

Two user fence EOPs in a row, one of which woke up my process.

[  150.251462] IH: CP EOP
[  150.251477] Woke kernel fences
[  150.251480] Waking up all waiters

Kernel fence.

[  150.256806] wait_user_fence offset 0 value 0 timeout -1
[  150.256828] Current data value 0

Stalled again, waiting for EOP interrupt. Could be because the GPU and CPU
have different views of memory at this point.

There will be two patches in reply to this mail - one is the Mesa patch, one
the kernel patch; I would greatly appreciate help getting this going.
--
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] [r600g] Use new kernel interface to wait for fences
  2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
@ 2012-01-31 13:17 ` Simon Farnsworth
  2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
  2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
  2 siblings, 0 replies; 18+ messages in thread
From: Simon Farnsworth @ 2012-01-31 13:17 UTC (permalink / raw)
  To: dri-devel

Instead of busywaiting for the GPU to finish a fence, use the new kernel
interface to wait for fence completion.

This code needs completion - in particular, we should fall back to
busywaiting (using the nokernel function that's in radeon_drm_bo.c) if the
kernel doesn't support the new interface.

Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
---
 src/gallium/drivers/r600/r600_hw_context.c    |    2 +-
 src/gallium/drivers/r600/r600_pipe.c          |   12 +++------
 src/gallium/winsys/radeon/drm/radeon_drm_bo.c |   30 +++++++++++++++++++++++++
 src/gallium/winsys/radeon/drm/radeon_winsys.h |   16 +++++++++++++
 4 files changed, 51 insertions(+), 9 deletions(-)

diff --git a/src/gallium/drivers/r600/r600_hw_context.c b/src/gallium/drivers/r600/r600_hw_context.c
index 8eb8e6d..35a57a7 100644
--- a/src/gallium/drivers/r600/r600_hw_context.c
+++ b/src/gallium/drivers/r600/r600_hw_context.c
@@ -1618,7 +1618,7 @@ void r600_context_emit_fence(struct r600_context *ctx, struct r600_resource *fen
 	ctx->pm4[ctx->pm4_cdwords++] = EVENT_TYPE(EVENT_TYPE_CACHE_FLUSH_AND_INV_TS_EVENT) | EVENT_INDEX(5);
 	ctx->pm4[ctx->pm4_cdwords++] = va & 0xFFFFFFFFUL;       /* ADDRESS_LO */
 	/* DATA_SEL | INT_EN | ADDRESS_HI */
-	ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (0 << 24) | ((va >> 32UL) & 0xFF);
+	ctx->pm4[ctx->pm4_cdwords++] = (1 << 29) | (2 << 24) | ((va >> 32UL) & 0xFF);
 	ctx->pm4[ctx->pm4_cdwords++] = value;                   /* DATA_LO */
 	ctx->pm4[ctx->pm4_cdwords++] = 0;                       /* DATA_HI */
 	ctx->pm4[ctx->pm4_cdwords++] = PKT3(PKT3_NOP, 0, 0);
diff --git a/src/gallium/drivers/r600/r600_pipe.c b/src/gallium/drivers/r600/r600_pipe.c
index c38fbc5..12c5bf5 100644
--- a/src/gallium/drivers/r600/r600_pipe.c
+++ b/src/gallium/drivers/r600/r600_pipe.c
@@ -595,7 +595,6 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen,
 	struct r600_screen *rscreen = (struct r600_screen *)pscreen;
 	struct r600_fence *rfence = (struct r600_fence*)fence;
 	int64_t start_time = 0;
-	unsigned spins = 0;
 
 	if (timeout != PIPE_TIMEOUT_INFINITE) {
 		start_time = os_time_get();
@@ -605,13 +604,10 @@ static boolean r600_fence_finish(struct pipe_screen *pscreen,
 	}
 
 	while (rscreen->fences.data[rfence->index] == 0) {
-		if (++spins % 256)
-			continue;
-#ifdef PIPE_OS_UNIX
-		sched_yield();
-#else
-		os_time_sleep(10);
-#endif
+		rscreen->ws->buffer_wait_fence(rscreen->fences.bo->buf,
+					       rfence->index << 2,
+					       0,
+					       timeout);
 		if (timeout != PIPE_TIMEOUT_INFINITE &&
 		    os_time_get() - start_time >= timeout) {
 			return FALSE;
diff --git a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
index 143dcf9..b552c11 100644
--- a/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
+++ b/src/gallium/winsys/radeon/drm/radeon_drm_bo.c
@@ -879,6 +879,35 @@ static uint64_t radeon_winsys_bo_va(struct pb_buffer *buffer)
     return bo->va;
 }
 
+/* No kernel support for doing this faster - just spin */
+static void radeon_winsys_bo_wait_fence_nokernel(struct pb_buffer *buf,
+						 unsigned offset,
+						 uint32_t value,
+						 uint64_t timeout)
+{
+#ifdef PIPE_OS_UNIX
+    sched_yield();
+#else
+    os_time_sleep(10);
+#endif
+}
+
+static void radeon_winsys_bo_wait_fence(struct pb_buffer *_buf,
+					unsigned offset,
+					uint32_t value,
+					uint64_t timeout)
+{
+    struct radeon_bo *bo = get_radeon_bo(_buf);
+    struct drm_radeon_gem_wait_user_fence args;
+    memset(&args, 0, sizeof(args));
+    args.handle = bo->handle;
+    args.offset = offset;
+    args.value = value;
+    args.timeout_usec = timeout;
+    while (drmCommandWrite(bo->rws->fd, DRM_RADEON_GEM_WAIT_USER_FENCE,
+                               &args, sizeof(args)) == -EBUSY);
+}
+
 void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws)
 {
     ws->base.buffer_get_cs_handle = radeon_drm_get_cs_handle;
@@ -892,4 +921,5 @@ void radeon_bomgr_init_functions(struct radeon_drm_winsys *ws)
     ws->base.buffer_from_handle = radeon_winsys_bo_from_handle;
     ws->base.buffer_get_handle = radeon_winsys_bo_get_handle;
     ws->base.buffer_get_virtual_address = radeon_winsys_bo_va;
+    ws->base.buffer_wait_fence = radeon_winsys_bo_wait_fence;
 }
diff --git a/src/gallium/winsys/radeon/drm/radeon_winsys.h b/src/gallium/winsys/radeon/drm/radeon_winsys.h
index e462e86..869961f 100644
--- a/src/gallium/winsys/radeon/drm/radeon_winsys.h
+++ b/src/gallium/winsys/radeon/drm/radeon_winsys.h
@@ -264,6 +264,22 @@ struct radeon_winsys {
      */
     uint64_t (*buffer_get_virtual_address)(struct pb_buffer *buf);
 
+    /**
+     * Wait until a fence (EVENT_WRITE_EOP typically) has had a chance to
+     * write to a buffer. NB: there is no guarantee that the GPU has written
+     * to the buffer when this call returns, merely that it has had an
+     * opportunity to do so.
+     *
+     * \param buf       A winsys buffer object
+     * \param offset    Offset in bytes within the buffer that you expect to see changed - must be uint32_t aligned
+     * \param value     The current value stored at offset
+     * \param timeout   The maximum wait time, in microseconds
+     */
+    void (*buffer_wait_fence)(struct pb_buffer *buf,
+                              unsigned offset,
+                              uint32_t value,
+                              uint64_t timeout);
+
     /**************************************************************************
      * Command submission.
      *
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
  2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
@ 2012-01-31 13:18 ` Simon Farnsworth
  2012-01-31 14:49   ` Simon Farnsworth
  2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
  2 siblings, 1 reply; 18+ messages in thread
From: Simon Farnsworth @ 2012-01-31 13:18 UTC (permalink / raw)
  To: dri-devel

Userspace currently busywaits for fences to complete; on my workload, this
busywait consumes 10% of the available CPU time.

Provide an ioctl so that userspace can wait for an EOP interrupt that
corresponds to a previous EVENT_WRITE_EOP.

This currently doesn't work, hence the debug code piled in.

Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
---
 drivers/gpu/drm/radeon/evergreen.c     |    8 ++--
 drivers/gpu/drm/radeon/radeon.h        |    3 +
 drivers/gpu/drm/radeon/radeon_device.c |    1 +
 drivers/gpu/drm/radeon/radeon_fence.c  |    3 +
 drivers/gpu/drm/radeon/radeon_gem.c    |   70 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_kms.c    |    1 +
 include/drm/radeon_drm.h               |   28 +++++++++++++
 7 files changed, 110 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
index 0c5dd78..5b886b0 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3083,11 +3083,11 @@ restart_ih:
 		case 176: /* CP_INT in ring buffer */
 		case 177: /* CP_INT in IB1 */
 		case 178: /* CP_INT in IB2 */
-			DRM_DEBUG("IH: CP int: 0x%08x\n", src_data);
+			printk(KERN_INFO "IH: CP int: 0x%08x\n", src_data);
 			radeon_fence_process(rdev, RADEON_RING_TYPE_GFX_INDEX);
 			break;
 		case 181: /* CP EOP event */
-			DRM_DEBUG("IH: CP EOP\n");
+			printk(KERN_INFO "IH: CP EOP\n");
 			if (rdev->family >= CHIP_CAYMAN) {
 				switch (src_data) {
 				case 0:
@@ -3104,12 +3104,12 @@ restart_ih:
 				radeon_fence_process(rdev, RADEON_RING_TYPE_GFX_INDEX);
 			break;
 		case 233: /* GUI IDLE */
-			DRM_DEBUG("IH: GUI idle\n");
+			printk(KERN_INFO "IH: GUI idle\n");
 			rdev->pm.gui_idle = true;
 			wake_up(&rdev->irq.idle_queue);
 			break;
 		default:
-			DRM_DEBUG("Unhandled interrupt: %d %d\n", src_id, src_data);
+			printk(KERN_INFO "Unhandled interrupt: %d %d\n", src_id, src_data);
 			break;
 		}
 
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 2859406..fb0eafd 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -1348,6 +1348,8 @@ int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
 int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+				     struct drm_file *filp);
 
 /* VRAM scratch page for HDP bug, default vram page */
 struct r600_vram_scratch {
@@ -1444,6 +1446,7 @@ struct radeon_device {
 	struct radeon_mman		mman;
 	rwlock_t			fence_lock;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
+	wait_queue_head_t		userspace_fence_wait_queue;
 	struct radeon_semaphore_driver	semaphore_drv;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	struct radeon_ib_pool		ib_pool;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0afb13b..dcf11e5 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -730,6 +730,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	mutex_init(&rdev->pm.mutex);
 	mutex_init(&rdev->vram_mutex);
 	rwlock_init(&rdev->fence_lock);
+	init_waitqueue_head(&rdev->userspace_fence_wait_queue);
 	rwlock_init(&rdev->semaphore_drv.lock);
 	INIT_LIST_HEAD(&rdev->gem.objects);
 	init_waitqueue_head(&rdev->irq.vblank_queue);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 64ea3dd..5b8270f 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -355,7 +355,10 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 	if (wake) {
 		wake_up_all(&rdev->fence_drv[ring].queue);
+		printk( KERN_INFO "Woke kernel fences\n" );
 	}
+	printk( KERN_INFO "Waking up all waiters\n" );
+	wake_up_interruptible_all(&rdev->userspace_fence_wait_queue);
 }
 
 int radeon_fence_count_emitted(struct radeon_device *rdev, int ring)
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index 7337850..6866f75 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -531,3 +531,73 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 {
 	return drm_gem_handle_delete(file_priv, handle);
 }
+
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *filp)
+{
+	struct drm_radeon_gem_wait_user_fence *args = data;
+	struct radeon_device *rdev = dev->dev_private;
+	struct drm_gem_object *gobj;
+	struct radeon_bo *robj;
+	void *buffer_data;
+	uint32_t *fence_data;
+	int r = 0;
+	long timeout;
+
+	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
+
+	gobj = drm_gem_object_lookup(dev, filp, args->handle);
+	if (gobj == NULL) {
+		return -ENOENT;
+	}
+	robj = gem_to_radeon_bo(gobj);
+
+	if (gobj->size < args->offset) {
+		printk( KERN_INFO "Offset too large\n" );
+		r = -EINVAL;
+		goto unreference;
+	}
+
+	r = radeon_bo_reserve(robj, true);
+	if (r) {
+		printk( KERN_INFO "Reserve fail\n" );
+		goto unreference;
+	}
+
+	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
+	if (r) {
+		printk( KERN_INFO "Pin fail\n" );
+		goto unreserve;
+	}
+
+	r = radeon_bo_kmap(robj, &buffer_data);
+	if (r) {
+		printk( KERN_INFO "kmap fail\n" );
+		goto unpin;
+	}
+
+	fence_data = (uint32_t*)buffer_data;
+
+	printk( KERN_INFO "Current data value %d\n", fence_data[args->offset >> 2] );
+
+	timeout = wait_event_interruptible_timeout(rdev->userspace_fence_wait_queue,
+						   fence_data[args->offset >> 2] != args->value,
+						   usecs_to_jiffies(args->timeout_usec));
+	if (timeout == 0)
+		r = -ETIMEDOUT;
+	else if (timeout < 0)
+		r = timeout;
+
+	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
+	printk( KERN_INFO "Finished data value %d\n", fence_data[args->offset >> 2] );
+
+	radeon_bo_kunmap(robj);
+unpin:
+	radeon_bo_unpin(robj);
+unreserve:
+	radeon_bo_unreserve(robj);
+unreference:
+	drm_gem_object_unreference_unlocked(gobj);
+
+	return r;
+}
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index d335288..0e552cc 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -496,5 +496,6 @@ struct drm_ioctl_desc radeon_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(RADEON_GEM_WAIT_USER_FENCE, radeon_gem_wait_user_fence_ioctl, DRM_AUTH|DRM_UNLOCKED),
 };
 int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h
index dd2e9cf..c261c8c 100644
--- a/include/drm/radeon_drm.h
+++ b/include/drm/radeon_drm.h
@@ -510,6 +510,7 @@ typedef struct {
 #define DRM_RADEON_GEM_GET_TILING	0x29
 #define DRM_RADEON_GEM_BUSY		0x2a
 #define DRM_RADEON_GEM_VA		0x2b
+#define DRM_RADEON_GEM_WAIT_USER_FENCE  0x2c
 
 #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
 #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
@@ -552,6 +553,7 @@ typedef struct {
 #define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
 #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
 #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
+#define DRM_IOCTL_RADEON_GEM_WAIT_USER_FENCE   DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_USER_FENCE, struct drm_radeon_gem_wait_user_fence)
 
 typedef struct drm_radeon_init {
 	enum {
@@ -967,4 +969,30 @@ struct drm_radeon_info {
 	uint64_t		value;
 };
 
+/**
+ * struct drm_radeon_gem_wait_user_fence - DRM_RADEON_GEM_WAIT_USER_FENCE ioctl param
+ *
+ * @handle: Handle for the object that the GPU is expected to write
+ * @offset: Offset (in bytes) within that object where the GPU is expected
+ *          to write. Must be DWORD-aligned
+ * @value: The value expected if the GPU has not yet written to this location
+ * @timeout_usec: The maximum time to wait for the GPU, in microseconds
+ *
+ * The DRM_RADEON_GEM_WAIT_USER_FENCE ioctl is meant to allow userspace to
+ * avoid busy-waiting for a EVENT_WRITE_EOP packet to complete (e.g. for
+ * fence sync objects in OpenGL). It expects the EVENT_WRITE_EOP packet to
+ * have requested an interrupt on completion.
+ *
+ * The ioctl will return immediately if the value supplied is not the value
+ * found in the buffer at offset bytes in; otherwise, it will sleep for up
+ * to timeout_usec, waking up when an EVENT_WRITE_EOP packet causes an
+ * interrupt and the value in the buffer might have changed.
+ */
+struct drm_radeon_gem_wait_user_fence {
+	uint32_t                handle;
+	uint64_t                offset;
+	uint32_t                value;
+	uint64_t                timeout_usec;
+};
+
 #endif
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
@ 2012-01-31 14:49   ` Simon Farnsworth
  0 siblings, 0 replies; 18+ messages in thread
From: Simon Farnsworth @ 2012-01-31 14:49 UTC (permalink / raw)
  To: dri-devel


[-- Attachment #1.1: Type: Text/Plain, Size: 2675 bytes --]

Alex Deucher pointed out an error on IRC; I'm not using
radeon_irq_kms_sw_irq_get and put to manage the IRQ enablement.

I've fixed this up (as per the partial hunk below), and my bug goes away. I
will be cleaning these patches up for proper submission.

Simon

On Tuesday 31 January 2012, Simon Farnsworth <simon.farnsworth@onelan.co.uk> wrote:
<snip>
> +int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *filp)
> +{
> +	struct drm_radeon_gem_wait_user_fence *args = data;
> +	struct radeon_device *rdev = dev->dev_private;
> +	struct drm_gem_object *gobj;
> +	struct radeon_bo *robj;
> +	void *buffer_data;
> +	uint32_t *fence_data;
> +	int r = 0;
> +	long timeout;
> +
> +	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
> +
> +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> +	if (gobj == NULL) {
> +		return -ENOENT;
> +	}
> +	robj = gem_to_radeon_bo(gobj);
> +
> +	if (gobj->size < args->offset) {
> +		printk( KERN_INFO "Offset too large\n" );
> +		r = -EINVAL;
> +		goto unreference;
> +	}
> +
> +	r = radeon_bo_reserve(robj, true);
> +	if (r) {
> +		printk( KERN_INFO "Reserve fail\n" );
> +		goto unreference;
> +	}
> +
> +	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
> +	if (r) {
> +		printk( KERN_INFO "Pin fail\n" );
> +		goto unreserve;
> +	}
> +
> +	r = radeon_bo_kmap(robj, &buffer_data);
> +	if (r) {
> +		printk( KERN_INFO "kmap fail\n" );
> +		goto unpin;
> +	}
> +
Missing radeon_irq_kms_sw_irq_get(rdev, RADEON_RING_TYPE_GFX_INDEX); here

> +	fence_data = (uint32_t*)buffer_data;
> +
> +	printk( KERN_INFO "Current data value %d\n", fence_data[args->offset >> 2] );
> +
> +	timeout = wait_event_interruptible_timeout(rdev->userspace_fence_wait_queue,
> +						   fence_data[args->offset >> 2] != args->value,
> +						   usecs_to_jiffies(args->timeout_usec));

And missing radeon_irq_kms_sw_irq_put(rdev, RADEON_RING_TYPE_GFX_INDEX); here.

> +	if (timeout == 0)
> +		r = -ETIMEDOUT;
> +	else if (timeout < 0)
> +		r = timeout;
> +
> +	printk( KERN_INFO "wait_user_fence offset %lld value %d timeout %lld\n", args->offset, args->value, args->timeout_usec );
> +	printk( KERN_INFO "Finished data value %d\n", fence_data[args->offset >> 2] );
> +
> +	radeon_bo_kunmap(robj);
> +unpin:
> +	radeon_bo_unpin(robj);
> +unreserve:
> +	radeon_bo_unreserve(robj);
> +unreference:
> +	drm_gem_object_unreference_unlocked(gobj);
> +
> +	return r;
> +}

-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour
  2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
  2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
  2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
@ 2012-01-31 15:38 ` Alan Swanson
  2012-01-31 15:54   ` Simon Farnsworth
  2 siblings, 1 reply; 18+ messages in thread
From: Alan Swanson @ 2012-01-31 15:38 UTC (permalink / raw)
  To: Simon Farnsworth; +Cc: dri-devel

On Tue, 2012-01-31 at 13:16 +0000, Simon Farnsworth wrote:
> Hello,
> 
> When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
> wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
> r600_fence_finish was taking 10% of my CPU time. I determined experimentally
> that changing from sched_yield() to os_time_sleep(10) fixed this and
> resolved my last performance issue on AMD Fusion as compared to Intel Atom,
> but felt that this was hacky.

No, you were right the first time, sched_yield should definitely not be
being used in this busy wait under Linux (even with its preceding few
spins).

Back in 2003 when Linux 2.5 changed sched_yield to move processes from
run queue to the expired queue instead, its use was discussed on the DRI
devel list.

http://lwn.net/Articles/31462/
http://thread.gmane.org/gmane.comp.video.dri.devel/5455/focus=6779

There stills seems to be five uses that should be checked.

src/gallium/drivers/r600/r600_pipe.c
src/gallium/drivers/nouveau/nouveau_fence.c
src/gallium/auxiliary/pipebuffer/pb_buffer_fenced.c
src/gallium/winsys/radeon/drm/radeon_drm_bo.c (*2)

-- 
Alan.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour
  2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
@ 2012-01-31 15:54   ` Simon Farnsworth
  0 siblings, 0 replies; 18+ messages in thread
From: Simon Farnsworth @ 2012-01-31 15:54 UTC (permalink / raw)
  To: Alan Swanson; +Cc: dri-devel


[-- Attachment #1.1: Type: Text/Plain, Size: 1227 bytes --]

On Tuesday 31 January 2012, Alan Swanson <swanson@ukfsn.org> wrote:
> On Tue, 2012-01-31 at 13:16 +0000, Simon Farnsworth wrote:
> > Hello,
> > 
> > When profiling my workload on an AMD E-350 (PALM GPU) to see why it still
> > wasn't performing well with Jerome's WIP macrotiling patches, I noticed that
> > r600_fence_finish was taking 10% of my CPU time. I determined experimentally
> > that changing from sched_yield() to os_time_sleep(10) fixed this and
> > resolved my last performance issue on AMD Fusion as compared to Intel Atom,
> > but felt that this was hacky.
> 
> No, you were right the first time, sched_yield should definitely not be
> being used in this busy wait under Linux (even with its preceding few
> spins).
>
Given that I can get the hardware to raise an interrupt when the fence
finishes, I think the sleep is hacky - I should not be spinning under normal
circumstances.

I'm preparing patches that remove the sleep completely if you have new
enough kernel, libdrm and Mesa. If someone else prepares patches to remove
the sched_yield usage completely, I'll happily test them on my Radeon hardware.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-02-01 11:31     ` Simon Farnsworth
@ 2012-02-01 18:21       ` Christian König
  0 siblings, 0 replies; 18+ messages in thread
From: Christian König @ 2012-02-01 18:21 UTC (permalink / raw)
  To: Simon Farnsworth; +Cc: David Airlie, dri-devel

[-- Attachment #1: Type: text/plain, Size: 3721 bytes --]

On 01.02.2012 12:31, Simon Farnsworth wrote:
> Christian,
>
> You said elsewhere that you have half-finished patches that illustrate the
> interface Jerome prefers - if you send them to me, I can work on finishing
> them.
>
> Responding to the rest of the message:
Well it's probably easier to type that down again instead of searching 
in a two month old reflog....

Patches are attached, but be aware that they are WIP and beside compile 
testing completely untested.

Whats missing is the kernel interface, but that shouldn't be to 
complicated to implement.

Christian.

>
> On Tuesday 31 January 2012, Jerome Glisse<j.glisse@gmail.com>  wrote:
>> On Tue, Jan 31, 2012 at 01:55:43PM -0500, David Airlie wrote:
>>> Some comments below.
>>>
>>>> +	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
>>>> +	if (r) {
>>>> +		goto unreserve;
>>>> +	}
>>>> +
>>>> +	r = radeon_bo_kmap(robj,&buffer_data);
>>>> +	if (r) {
>>>> +		goto unpin;
>>>> +	}
>>>> +
>>>
>>> Do you need to pin it? I think if you have it reserved and you are in here you shouldn't need to. (unless kmapping requires a pin)/
>> No you don't need to pin, actually it's bad to pin the buffer you might
>> want to wait on might be in vram.
>>
> I'll remove the pin/unpin calls - I was copying use from elsewhere in the
> kernel, when trying to work out why it didn't work as expected.
>
>> Also you never assume that things could go wrong and that your value
>> might never be written to the buffer. So it will wait forever in the
>> kernel.
>>
> I'm using wait_event_interruptible_timeout - userspace can pass me a timeout
> if it knows how long it expects to wait (and recover if it takes too long),
> and will be interrupted by signals. In an earlier version of the code, I
> didn't enable the EOP interrupts, and was able to recover userspace simply
> by sending it a SIGSTOP/SIGCONT pair (until the next fence, when it stalled
> again).
>
> In that respect, this is no different to the existing situation in userspace
> - if the GPU is reset before the EVENT_WRITE_EOP executes, userspace will
> spin until a signal interrupts it. All that changes if userspace uses this
> ioctl is that userspace will sleep instead of burning a hole in my CPU
> budget - the recovery strategy is unchanged.
>
>> As i said in the other mail i would rather as a wait on submited cs
>> ioctl and add some return value from cs ioctl. I can't remember the
>> outcome of this discusion we had when we were doing virtual memory
>> support. Alex ? Michel ? better memory than i do ? :)
>>
> If someone has half-complete patches illustrating what you mean, I can take
> on the work of finishing them off. I've cc'd Christian on this mail, as it
> looks like he has a starting point.
>
> One slight advantage this interface has over a "wait on submitted CS ioctl"
> interface is that, given suitable userspace changes, it becomes possible to
> submit multiple fences in one IB, and wait for them individually (or only
> wait for a subset of them). However, I'm not sure that that's enough of an
> advantage to outweigh the disadvantages, especially given that the userspace
> changes required are fairly intrusive.
>
>> Cheers,
>> Jerome
>>
>>>
>>>> +	radeon_irq_kms_sw_irq_get(rdev, ring);
>>>> +
>>>> +	fence_data = (uint32_t*)buffer_data;
>>>> +	timeout =
>>>> an
>>>> + * interrupt and the value in the buffer might have changed.
>>>> + */
>>>> +struct drm_radeon_gem_wait_user_fence {
>>>> +	uint32_t                handle;
>>>> +	uint32_t                ring;
>>>> +	uint64_t                offset;
>>>> +	uint32_t                value;
>>>> +	uint64_t                timeout_usec;
>>> Align things here, 64 then 64 then 32 32 32 and a 32 pad.
>>>
> Will do in v2.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-drm-radeon-expand-fence-sequence-values-to-64-bit.patch --]
[-- Type: text/x-patch; name="0001-drm-radeon-expand-fence-sequence-values-to-64-bit.patch", Size: 6166 bytes --]

>From 99f526793e869749209cedc60eb9f109e7046bb3 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <deathsimple@vodafone.de>
Date: Wed, 1 Feb 2012 15:58:20 +0100
Subject: [PATCH 1/2] drm/radeon: expand fence sequence values to 64 bit
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

They are protected by a read/write lock anyway, so
we actually don't need to use the atomic type.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h       |    6 +++---
 drivers/gpu/drm/radeon/radeon_fence.c |   29 +++++++++++++++++------------
 2 files changed, 20 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 73e05cb..2f4fb4d 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -209,8 +209,8 @@ struct radeon_fence_driver {
 	uint32_t			scratch_reg;
 	uint64_t			gpu_addr;
 	volatile uint32_t		*cpu_addr;
-	atomic_t			seq;
-	uint32_t			last_seq;
+	uint64_t			seq;
+	uint64_t			last_seq;
 	unsigned long			last_jiffies;
 	unsigned long			last_timeout;
 	wait_queue_head_t		queue;
@@ -225,7 +225,7 @@ struct radeon_fence {
 	struct kref			kref;
 	struct list_head		list;
 	/* protected by radeon_fence.lock */
-	uint32_t			seq;
+	uint64_t			seq;
 	bool				emitted;
 	bool				signaled;
 	/* RB, DMA, etc. */
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 64ea3dd..ac177c5 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -29,7 +29,6 @@
  *    Dave Airlie
  */
 #include <linux/seq_file.h>
-#include <linux/atomic.h>
 #include <linux/wait.h>
 #include <linux/list.h>
 #include <linux/kref.h>
@@ -70,7 +69,7 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence)
 		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 		return 0;
 	}
-	fence->seq = atomic_add_return(1, &rdev->fence_drv[fence->ring].seq);
+	fence->seq = ++rdev->fence_drv[fence->ring].seq;
 	if (!rdev->ring[fence->ring].ready)
 		/* FIXME: cp is not running assume everythings is done right
 		 * away
@@ -90,12 +89,18 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
 {
 	struct radeon_fence *fence;
 	struct list_head *i, *n;
-	uint32_t seq;
+	uint64_t seq;
 	bool wake = false;
 	unsigned long cjiffies;
 
 	seq = radeon_fence_read(rdev, ring);
+	seq |= rdev->fence_drv[ring].last_seq & 0xFFFFFFFF00000000;
 	if (seq != rdev->fence_drv[ring].last_seq) {
+		if (seq < rdev->fence_drv[ring].last_seq) {
+			/* sequence wrapped around */
+			seq = (seq & 0xFFFFFFFF) | 
+			      (rdev->fence_drv[ring].seq & 0xFFFFFFFF00000000);
+		}
 		rdev->fence_drv[ring].last_seq = seq;
 		rdev->fence_drv[ring].last_jiffies = jiffies;
 		rdev->fence_drv[ring].last_timeout = RADEON_FENCE_JIFFIES_TIMEOUT;
@@ -216,7 +221,7 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
 	struct radeon_device *rdev;
 	unsigned long irq_flags, timeout;
-	u32 seq;
+	uint64_t last_seq;
 	int r;
 
 	if (fence == NULL) {
@@ -230,8 +235,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 	timeout = rdev->fence_drv[fence->ring].last_timeout;
 retry:
 	/* save current sequence used to check for GPU lockup */
-	seq = rdev->fence_drv[fence->ring].last_seq;
-	trace_radeon_fence_wait_begin(rdev->ddev, seq);
+	last_seq = rdev->fence_drv[fence->ring].last_seq;
+	trace_radeon_fence_wait_begin(rdev->ddev, last_seq);
 	if (intr) {
 		radeon_irq_kms_sw_irq_get(rdev, fence->ring);
 		r = wait_event_interruptible_timeout(rdev->fence_drv[fence->ring].queue,
@@ -246,7 +251,7 @@ retry:
 			 radeon_fence_signaled(fence), timeout);
 		radeon_irq_kms_sw_irq_put(rdev, fence->ring);
 	}
-	trace_radeon_fence_wait_end(rdev->ddev, seq);
+	trace_radeon_fence_wait_end(rdev->ddev, last_seq);
 	if (unlikely(!radeon_fence_signaled(fence))) {
 		/* we were interrupted for some reason and fence isn't
 		 * isn't signaled yet, resume wait
@@ -258,11 +263,11 @@ retry:
 		/* don't protect read access to rdev->fence_drv[t].last_seq
 		 * if we experiencing a lockup the value doesn't change
 		 */
-		if (seq == rdev->fence_drv[fence->ring].last_seq &&
+		if (last_seq == rdev->fence_drv[fence->ring].last_seq &&
 		    radeon_gpu_is_lockup(rdev, &rdev->ring[fence->ring])) {
 			/* good news we believe it's a lockup */
 			printk(KERN_WARNING "GPU lockup (waiting for 0x%08X last fence id 0x%08X)\n",
-			     fence->seq, seq);
+			     (uint32_t)fence->seq, (uint32_t)last_seq);
 			/* FIXME: what should we do ? marking everyone
 			 * as signaled for now
 			 */
@@ -403,7 +408,7 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
 	}
 	rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4];
 	rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
-	radeon_fence_write(rdev, atomic_read(&rdev->fence_drv[ring].seq), ring);
+	radeon_fence_write(rdev, rdev->fence_drv[ring].seq, ring);
 	rdev->fence_drv[ring].initialized = true;
 	DRM_INFO("fence driver on ring %d use gpu addr 0x%08Lx and cpu addr 0x%p\n",
 		 ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);
@@ -416,7 +421,7 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	rdev->fence_drv[ring].scratch_reg = -1;
 	rdev->fence_drv[ring].cpu_addr = NULL;
 	rdev->fence_drv[ring].gpu_addr = 0;
-	atomic_set(&rdev->fence_drv[ring].seq, 0);
+	rdev->fence_drv[ring].seq = 0;
 	INIT_LIST_HEAD(&rdev->fence_drv[ring].created);
 	INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
 	INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
@@ -481,7 +486,7 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data)
 			fence = list_entry(rdev->fence_drv[i].emitted.prev,
 					   struct radeon_fence, list);
 			seq_printf(m, "Last emitted fence %p with 0x%08X\n",
-				   fence,  fence->seq);
+				   fence,  (uint32_t)fence->seq);
 		}
 	}
 	return 0;
-- 
1.7.5.4


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0002-drm-radeon-interface-waiting-for-fence-values.patch --]
[-- Type: text/x-patch; name="0002-drm-radeon-interface-waiting-for-fence-values.patch", Size: 1728 bytes --]

>From 2d3d429627d94143b0a664388496dad06a48426a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <deathsimple@vodafone.de>
Date: Wed, 1 Feb 2012 19:16:34 +0100
Subject: [PATCH 2/2] drm/radeon: interface waiting for fence values
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

To support waiting for fence values from usermode.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon_fence.c |   22 ++++++++++++++++++++++
 1 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index ac177c5..85893c3 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -334,6 +334,28 @@ int radeon_fence_wait_last(struct radeon_device *rdev, int ring)
 	return r;
 }
 
+static bool radeon_fence_value_reached(struct radeon_device *rdev,
+				       int ring, uint64_t value)
+{
+	unsigned long irq_flags;
+	bool result;
+	read_lock_irqsave(&rdev->fence_lock, irq_flags);
+	result = rdev->fence_drv[ring].last_seq <= value;
+	read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
+	return result;
+}
+
+int radeon_fence_wait_value(struct radeon_device *rdev, int ring,
+			    uint64_t value, unsigned long timeout)
+{
+	int r;
+	radeon_irq_kms_sw_irq_get(rdev, ring);
+	r = wait_event_interruptible_timeout(rdev->fence_drv[ring].queue,
+			radeon_fence_value_reached(rdev, ring, value), timeout);
+	radeon_irq_kms_sw_irq_put(rdev, ring);
+	return r;
+}
+
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
 {
 	kref_get(&fence->kref);
-- 
1.7.5.4


[-- Attachment #4: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-02-01  8:39       ` Michel Dänzer
@ 2012-02-01 13:58         ` Marek Olšák
  0 siblings, 0 replies; 18+ messages in thread
From: Marek Olšák @ 2012-02-01 13:58 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel

2012/2/1 Michel Dänzer <michel@daenzer.net>:
> On Die, 2012-01-31 at 22:08 +0100, Marek Olšák wrote:
>> 2012/1/31 Jerome Glisse <j.glisse@gmail.com>:
>> > On Tue, Jan 31, 2012 at 06:56:01PM +0100, Michel Dänzer wrote:
>> >> On Die, 2012-01-31 at 16:59 +0000, Simon Farnsworth wrote:
>> >> > Userspace currently busywaits for fences to complete; on my workload, this
>> >> > busywait consumes 10% of the available CPU time.
>> >> >
>> >> > Provide an ioctl so that userspace can wait for an EOP interrupt that
>> >> > corresponds to a previous EVENT_WRITE_EOP.
>> >> >
>> >> > Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
>> >> > ---
>> >> > I've been working on top of Jerome's tiling patches, so this doesn't apply
>> >> > directly on top of current upstream kernels. I can easily rebase to another
>> >> > version upon request - just point me to a git tree.
>> >> >
>> >> > My goal is to remove the sched_yield in Mesa's r600_fence_finish given up to
>> >> > date enough kernel; I hope, though, that the interface is clean enough for
>> >> > other users to extend it in the future (e.g. using compute rings).
>> >>
>> >> I'm afraid not: Unless I'm missing something, userspace can't know which
>> >> ring the kernel submitted the CS to, and the kernel can't guess which
>> >> ring userspace needs to wait for.
>> >
>> > iirc the plan was to add a return value to cs ioctl and add an ioctl to
>> > allow to wait on this return value. ie allowing userspace to wait on
>> > specific submited cs.
>>
>> You don't need a new API for that, r300g already does that. It adds a
>> dummy relocation and later uses GEM_WAIT_IDLE to wait for it. r600g
>> can be updated to do the same thing without kernel changes (besides,
>> we must support the old kernels as well, so this is a no-brainer).
>
> One minor problem being that this doesn't support a timeout without
> spinning. Shouldn't be relevant for Simon's problem though.
>
>
>> What would be much more useful is to be able to wait for a fence,
>> which can be in the middle of a CS. Now that's something that would
>> justify changes in the kernel interface.
>
> To take advantage of that, one would also need to change Gallium such
> that it's possible to get a fence without a flush.

There is PIPE_QUERY_GPU_FINISHED, which should be used for that
purpose. We already agreed in another discussion that this is the way
fences should be done in Gallium. It's just a matter of updating
st/mesa to use it.

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 19:07   ` Jerome Glisse
  2012-01-31 19:13     ` Alex Deucher
@ 2012-02-01 11:31     ` Simon Farnsworth
  2012-02-01 18:21       ` Christian König
  1 sibling, 1 reply; 18+ messages in thread
From: Simon Farnsworth @ 2012-02-01 11:31 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: David Airlie, König, Christian, dri-devel


[-- Attachment #1.1: Type: Text/Plain, Size: 3504 bytes --]

Christian,

You said elsewhere that you have half-finished patches that illustrate the
interface Jerome prefers - if you send them to me, I can work on finishing
them.

Responding to the rest of the message:

On Tuesday 31 January 2012, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Tue, Jan 31, 2012 at 01:55:43PM -0500, David Airlie wrote:
> > 
> > Some comments below.
> > 
> > > +	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
> > > +	if (r) {
> > > +		goto unreserve;
> > > +	}
> > > +
> > > +	r = radeon_bo_kmap(robj, &buffer_data);
> > > +	if (r) {
> > > +		goto unpin;
> > > +	}
> > > +
> > 
> > 
> > Do you need to pin it? I think if you have it reserved and you are in here you shouldn't need to. (unless kmapping requires a pin)/
> 
> No you don't need to pin, actually it's bad to pin the buffer you might
> want to wait on might be in vram.
>
I'll remove the pin/unpin calls - I was copying use from elsewhere in the
kernel, when trying to work out why it didn't work as expected.

> Also you never assume that things could go wrong and that your value
> might never be written to the buffer. So it will wait forever in the
> kernel.
>
I'm using wait_event_interruptible_timeout - userspace can pass me a timeout
if it knows how long it expects to wait (and recover if it takes too long),
and will be interrupted by signals. In an earlier version of the code, I
didn't enable the EOP interrupts, and was able to recover userspace simply
by sending it a SIGSTOP/SIGCONT pair (until the next fence, when it stalled
again).

In that respect, this is no different to the existing situation in userspace
- if the GPU is reset before the EVENT_WRITE_EOP executes, userspace will
spin until a signal interrupts it. All that changes if userspace uses this
ioctl is that userspace will sleep instead of burning a hole in my CPU
budget - the recovery strategy is unchanged.

> As i said in the other mail i would rather as a wait on submited cs
> ioctl and add some return value from cs ioctl. I can't remember the
> outcome of this discusion we had when we were doing virtual memory
> support. Alex ? Michel ? better memory than i do ? :)
>
If someone has half-complete patches illustrating what you mean, I can take
on the work of finishing them off. I've cc'd Christian on this mail, as it
looks like he has a starting point.

One slight advantage this interface has over a "wait on submitted CS ioctl"
interface is that, given suitable userspace changes, it becomes possible to
submit multiple fences in one IB, and wait for them individually (or only
wait for a subset of them). However, I'm not sure that that's enough of an
advantage to outweigh the disadvantages, especially given that the userspace
changes required are fairly intrusive.

> Cheers,
> Jerome
> 
> > 
> > 
> > > +	radeon_irq_kms_sw_irq_get(rdev, ring);
> > > +
> > > +	fence_data = (uint32_t*)buffer_data;
> > > +	timeout =
> > > an
> > > + * interrupt and the value in the buffer might have changed.
> > > + */
> > > +struct drm_radeon_gem_wait_user_fence {
> > > +	uint32_t                handle;
> > > +	uint32_t                ring;
> > > +	uint64_t                offset;
> > > +	uint32_t                value;
> > > +	uint64_t                timeout_usec;
> > 
> > Align things here, 64 then 64 then 32 32 32 and a 32 pad.
> >
Will do in v2.
-- 
Simon Farnsworth
Software Engineer
ONELAN Limited
http://www.onelan.com/

[-- Attachment #1.2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 490 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 21:08     ` Marek Olšák
@ 2012-02-01  8:39       ` Michel Dänzer
  2012-02-01 13:58         ` Marek Olšák
  0 siblings, 1 reply; 18+ messages in thread
From: Michel Dänzer @ 2012-02-01  8:39 UTC (permalink / raw)
  To: Marek Olšák; +Cc: dri-devel

On Die, 2012-01-31 at 22:08 +0100, Marek Olšák wrote: 
> 2012/1/31 Jerome Glisse <j.glisse@gmail.com>:
> > On Tue, Jan 31, 2012 at 06:56:01PM +0100, Michel Dänzer wrote:
> >> On Die, 2012-01-31 at 16:59 +0000, Simon Farnsworth wrote:
> >> > Userspace currently busywaits for fences to complete; on my workload, this
> >> > busywait consumes 10% of the available CPU time.
> >> >
> >> > Provide an ioctl so that userspace can wait for an EOP interrupt that
> >> > corresponds to a previous EVENT_WRITE_EOP.
> >> >
> >> > Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
> >> > ---
> >> > I've been working on top of Jerome's tiling patches, so this doesn't apply
> >> > directly on top of current upstream kernels. I can easily rebase to another
> >> > version upon request - just point me to a git tree.
> >> >
> >> > My goal is to remove the sched_yield in Mesa's r600_fence_finish given up to
> >> > date enough kernel; I hope, though, that the interface is clean enough for
> >> > other users to extend it in the future (e.g. using compute rings).
> >>
> >> I'm afraid not: Unless I'm missing something, userspace can't know which
> >> ring the kernel submitted the CS to, and the kernel can't guess which
> >> ring userspace needs to wait for.
> >
> > iirc the plan was to add a return value to cs ioctl and add an ioctl to
> > allow to wait on this return value. ie allowing userspace to wait on
> > specific submited cs.
> 
> You don't need a new API for that, r300g already does that. It adds a
> dummy relocation and later uses GEM_WAIT_IDLE to wait for it. r600g
> can be updated to do the same thing without kernel changes (besides,
> we must support the old kernels as well, so this is a no-brainer).

One minor problem being that this doesn't support a timeout without
spinning. Shouldn't be relevant for Simon's problem though.


> What would be much more useful is to be able to wait for a fence,
> which can be in the middle of a CS. Now that's something that would
> justify changes in the kernel interface.

To take advantage of that, one would also need to change Gallium such
that it's possible to get a fence without a flush.


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 18:56   ` Jerome Glisse
@ 2012-01-31 21:08     ` Marek Olšák
  2012-02-01  8:39       ` Michel Dänzer
  0 siblings, 1 reply; 18+ messages in thread
From: Marek Olšák @ 2012-01-31 21:08 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Michel Dänzer, dri-devel

2012/1/31 Jerome Glisse <j.glisse@gmail.com>:
> On Tue, Jan 31, 2012 at 06:56:01PM +0100, Michel Dänzer wrote:
>> On Die, 2012-01-31 at 16:59 +0000, Simon Farnsworth wrote:
>> > Userspace currently busywaits for fences to complete; on my workload, this
>> > busywait consumes 10% of the available CPU time.
>> >
>> > Provide an ioctl so that userspace can wait for an EOP interrupt that
>> > corresponds to a previous EVENT_WRITE_EOP.
>> >
>> > Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
>> > ---
>> > I've been working on top of Jerome's tiling patches, so this doesn't apply
>> > directly on top of current upstream kernels. I can easily rebase to another
>> > version upon request - just point me to a git tree.
>> >
>> > My goal is to remove the sched_yield in Mesa's r600_fence_finish given up to
>> > date enough kernel; I hope, though, that the interface is clean enough for
>> > other users to extend it in the future (e.g. using compute rings).
>>
>> I'm afraid not: Unless I'm missing something, userspace can't know which
>> ring the kernel submitted the CS to, and the kernel can't guess which
>> ring userspace needs to wait for.
>
> iirc the plan was to add a return value to cs ioctl and add an ioctl to
> allow to wait on this return value. ie allowing userspace to wait on
> specific submited cs.

You don't need a new API for that, r300g already does that. It adds a
dummy relocation and later uses GEM_WAIT_IDLE to wait for it. r600g
can be updated to do the same thing without kernel changes (besides,
we must support the old kernels as well, so this is a no-brainer).

What would be much more useful is to be able to wait for a fence,
which can be in the middle of a CS. Now that's something that would
justify changes in the kernel interface.

Marek
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 19:13     ` Alex Deucher
@ 2012-01-31 19:36       ` Jerome Glisse
  0 siblings, 0 replies; 18+ messages in thread
From: Jerome Glisse @ 2012-01-31 19:36 UTC (permalink / raw)
  To: Alex Deucher; +Cc: David Airlie, dri-devel

On Tue, Jan 31, 2012 at 02:13:00PM -0500, Alex Deucher wrote:
> On Tue, Jan 31, 2012 at 2:07 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> > On Tue, Jan 31, 2012 at 01:55:43PM -0500, David Airlie wrote:
> >>
> >>
> >> Some comments below.
> >>
> >> > +   struct radeon_device *rdev = dev->dev_private;
> >> > +   struct drm_gem_object *gobj;
> >> > +   struct radeon_bo *robj;
> >> > +   void *buffer_data;
> >> > +   uint32_t *fence_data;
> >> > +   int r = 0;
> >> > +   long timeout;
> >> > +   int ring = RADEON_RING_TYPE_GFX_INDEX;
> >> > +
> >> > +   /* If you're implementing this for other rings, you'll want to
> >> > share
> >> > +      code with radeon_cs_get_ring in radeon_cs.c */
> >> > +   if (args->ring != RADEON_CS_RING_GFX) {
> >> > +           return -EINVAL;
> >> > +   }
> >> > +
> >> > +   gobj = drm_gem_object_lookup(dev, filp, args->handle);
> >> > +   if (gobj == NULL) {
> >> > +           return -ENOENT;
> >> > +   }
> >> > +   robj = gem_to_radeon_bo(gobj);
> >> > +
> >> > +   if (gobj->size < args->offset) {
> >> > +           r = -EINVAL;
> >> > +           goto unreference;
> >> > +   }
> >> > +
> >> > +   r = radeon_bo_reserve(robj, true);
> >> > +   if (r) {
> >> > +           goto unreference;
> >> > +   }
> >> > +
> >> > +   r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
> >> > +   if (r) {
> >> > +           goto unreserve;
> >> > +   }
> >> > +
> >> > +   r = radeon_bo_kmap(robj, &buffer_data);
> >> > +   if (r) {
> >> > +           goto unpin;
> >> > +   }
> >> > +
> >>
> >>
> >> Do you need to pin it? I think if you have it reserved and you are in here you shouldn't need to. (unless kmapping requires a pin)/
> >
> > No you don't need to pin, actually it's bad to pin the buffer you might
> > want to wait on might be in vram.
> >
> > Also you never assume that things could go wrong and that your value
> > might never be written to the buffer. So it will wait forever in the
> > kernel.
> >
> > As i said in the other mail i would rather as a wait on submited cs
> > ioctl and add some return value from cs ioctl. I can't remember the
> > outcome of this discusion we had when we were doing virtual memory
> > support. Alex ? Michel ? better memory than i do ? :)
> >
> 
> I vaguely recall the discussion, but I don't remember the details,
> I'll have to look through my old mail.  Maybe a new CS ioctl flag
> requesting EOP irqs for the CS would be a better approach than a
> separate ioctl.
> 
> Alex

I think the idea was to return u64 value of specific ring and only
enable irq if new wait cs ioctl was call, a little bit like bo wait.

Will try to dig through my mail too.

Cheers,
Jerome


> >
> >>
> >>
> >> > +   radeon_irq_kms_sw_irq_get(rdev, ring);
> >> > +
> >> > +   fence_data = (uint32_t*)buffer_data;
> >> > +   timeout =
> >> > an
> >> > + * interrupt and the value in the buffer might have changed.
> >> > + */
> >> > +struct drm_radeon_gem_wait_user_fence {
> >> > +   uint32_t                handle;
> >> > +   uint32_t                ring;
> >> > +   uint64_t                offset;
> >> > +   uint32_t                value;
> >> > +   uint64_t                timeout_usec;
> >>
> >> Align things here, 64 then 64 then 32 32 32 and a 32 pad.
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 19:07   ` Jerome Glisse
@ 2012-01-31 19:13     ` Alex Deucher
  2012-01-31 19:36       ` Jerome Glisse
  2012-02-01 11:31     ` Simon Farnsworth
  1 sibling, 1 reply; 18+ messages in thread
From: Alex Deucher @ 2012-01-31 19:13 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: David Airlie, dri-devel

On Tue, Jan 31, 2012 at 2:07 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Tue, Jan 31, 2012 at 01:55:43PM -0500, David Airlie wrote:
>>
>>
>> Some comments below.
>>
>> > +   struct radeon_device *rdev = dev->dev_private;
>> > +   struct drm_gem_object *gobj;
>> > +   struct radeon_bo *robj;
>> > +   void *buffer_data;
>> > +   uint32_t *fence_data;
>> > +   int r = 0;
>> > +   long timeout;
>> > +   int ring = RADEON_RING_TYPE_GFX_INDEX;
>> > +
>> > +   /* If you're implementing this for other rings, you'll want to
>> > share
>> > +      code with radeon_cs_get_ring in radeon_cs.c */
>> > +   if (args->ring != RADEON_CS_RING_GFX) {
>> > +           return -EINVAL;
>> > +   }
>> > +
>> > +   gobj = drm_gem_object_lookup(dev, filp, args->handle);
>> > +   if (gobj == NULL) {
>> > +           return -ENOENT;
>> > +   }
>> > +   robj = gem_to_radeon_bo(gobj);
>> > +
>> > +   if (gobj->size < args->offset) {
>> > +           r = -EINVAL;
>> > +           goto unreference;
>> > +   }
>> > +
>> > +   r = radeon_bo_reserve(robj, true);
>> > +   if (r) {
>> > +           goto unreference;
>> > +   }
>> > +
>> > +   r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
>> > +   if (r) {
>> > +           goto unreserve;
>> > +   }
>> > +
>> > +   r = radeon_bo_kmap(robj, &buffer_data);
>> > +   if (r) {
>> > +           goto unpin;
>> > +   }
>> > +
>>
>>
>> Do you need to pin it? I think if you have it reserved and you are in here you shouldn't need to. (unless kmapping requires a pin)/
>
> No you don't need to pin, actually it's bad to pin the buffer you might
> want to wait on might be in vram.
>
> Also you never assume that things could go wrong and that your value
> might never be written to the buffer. So it will wait forever in the
> kernel.
>
> As i said in the other mail i would rather as a wait on submited cs
> ioctl and add some return value from cs ioctl. I can't remember the
> outcome of this discusion we had when we were doing virtual memory
> support. Alex ? Michel ? better memory than i do ? :)
>

I vaguely recall the discussion, but I don't remember the details,
I'll have to look through my old mail.  Maybe a new CS ioctl flag
requesting EOP irqs for the CS would be a better approach than a
separate ioctl.

Alex

> Cheers,
> Jerome
>
>>
>>
>> > +   radeon_irq_kms_sw_irq_get(rdev, ring);
>> > +
>> > +   fence_data = (uint32_t*)buffer_data;
>> > +   timeout =
>> > an
>> > + * interrupt and the value in the buffer might have changed.
>> > + */
>> > +struct drm_radeon_gem_wait_user_fence {
>> > +   uint32_t                handle;
>> > +   uint32_t                ring;
>> > +   uint64_t                offset;
>> > +   uint32_t                value;
>> > +   uint64_t                timeout_usec;
>>
>> Align things here, 64 then 64 then 32 32 32 and a 32 pad.
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/dri-devel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 18:55 ` David Airlie
@ 2012-01-31 19:07   ` Jerome Glisse
  2012-01-31 19:13     ` Alex Deucher
  2012-02-01 11:31     ` Simon Farnsworth
  0 siblings, 2 replies; 18+ messages in thread
From: Jerome Glisse @ 2012-01-31 19:07 UTC (permalink / raw)
  To: David Airlie; +Cc: dri-devel

On Tue, Jan 31, 2012 at 01:55:43PM -0500, David Airlie wrote:
> 
> 
> Some comments below.
> 
> > +	struct radeon_device *rdev = dev->dev_private;
> > +	struct drm_gem_object *gobj;
> > +	struct radeon_bo *robj;
> > +	void *buffer_data;
> > +	uint32_t *fence_data;
> > +	int r = 0;
> > +	long timeout;
> > +	int ring = RADEON_RING_TYPE_GFX_INDEX;
> > +
> > +	/* If you're implementing this for other rings, you'll want to
> > share
> > +	   code with radeon_cs_get_ring in radeon_cs.c */
> > +	if (args->ring != RADEON_CS_RING_GFX) {
> > +		return -EINVAL;
> > +	}
> > +
> > +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> > +	if (gobj == NULL) {
> > +		return -ENOENT;
> > +	}
> > +	robj = gem_to_radeon_bo(gobj);
> > +
> > +	if (gobj->size < args->offset) {
> > +		r = -EINVAL;
> > +		goto unreference;
> > +	}
> > +
> > +	r = radeon_bo_reserve(robj, true);
> > +	if (r) {
> > +		goto unreference;
> > +	}
> > +
> > +	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
> > +	if (r) {
> > +		goto unreserve;
> > +	}
> > +
> > +	r = radeon_bo_kmap(robj, &buffer_data);
> > +	if (r) {
> > +		goto unpin;
> > +	}
> > +
> 
> 
> Do you need to pin it? I think if you have it reserved and you are in here you shouldn't need to. (unless kmapping requires a pin)/

No you don't need to pin, actually it's bad to pin the buffer you might
want to wait on might be in vram.

Also you never assume that things could go wrong and that your value
might never be written to the buffer. So it will wait forever in the
kernel.

As i said in the other mail i would rather as a wait on submited cs
ioctl and add some return value from cs ioctl. I can't remember the
outcome of this discusion we had when we were doing virtual memory
support. Alex ? Michel ? better memory than i do ? :)

Cheers,
Jerome

> 
> 
> > +	radeon_irq_kms_sw_irq_get(rdev, ring);
> > +
> > +	fence_data = (uint32_t*)buffer_data;
> > +	timeout =
> > an
> > + * interrupt and the value in the buffer might have changed.
> > + */
> > +struct drm_radeon_gem_wait_user_fence {
> > +	uint32_t                handle;
> > +	uint32_t                ring;
> > +	uint64_t                offset;
> > +	uint32_t                value;
> > +	uint64_t                timeout_usec;
> 
> Align things here, 64 then 64 then 32 32 32 and a 32 pad.
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 17:56 ` Michel Dänzer
@ 2012-01-31 18:56   ` Jerome Glisse
  2012-01-31 21:08     ` Marek Olšák
  0 siblings, 1 reply; 18+ messages in thread
From: Jerome Glisse @ 2012-01-31 18:56 UTC (permalink / raw)
  To: Michel Dänzer; +Cc: dri-devel

On Tue, Jan 31, 2012 at 06:56:01PM +0100, Michel Dänzer wrote:
> On Die, 2012-01-31 at 16:59 +0000, Simon Farnsworth wrote: 
> > Userspace currently busywaits for fences to complete; on my workload, this
> > busywait consumes 10% of the available CPU time.
> > 
> > Provide an ioctl so that userspace can wait for an EOP interrupt that
> > corresponds to a previous EVENT_WRITE_EOP.
> > 
> > Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
> > ---
> > I've been working on top of Jerome's tiling patches, so this doesn't apply
> > directly on top of current upstream kernels. I can easily rebase to another
> > version upon request - just point me to a git tree.
> > 
> > My goal is to remove the sched_yield in Mesa's r600_fence_finish given up to
> > date enough kernel; I hope, though, that the interface is clean enough for
> > other users to extend it in the future (e.g. using compute rings).
> 
> I'm afraid not: Unless I'm missing something, userspace can't know which
> ring the kernel submitted the CS to, and the kernel can't guess which
> ring userspace needs to wait for.

iirc the plan was to add a return value to cs ioctl and add an ioctl to
allow to wait on this return value. ie allowing userspace to wait on
specific submited cs.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 16:59 [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
  2012-01-31 17:56 ` Michel Dänzer
@ 2012-01-31 18:55 ` David Airlie
  2012-01-31 19:07   ` Jerome Glisse
  1 sibling, 1 reply; 18+ messages in thread
From: David Airlie @ 2012-01-31 18:55 UTC (permalink / raw)
  To: Simon Farnsworth; +Cc: dri-devel



Some comments below.

> +	struct radeon_device *rdev = dev->dev_private;
> +	struct drm_gem_object *gobj;
> +	struct radeon_bo *robj;
> +	void *buffer_data;
> +	uint32_t *fence_data;
> +	int r = 0;
> +	long timeout;
> +	int ring = RADEON_RING_TYPE_GFX_INDEX;
> +
> +	/* If you're implementing this for other rings, you'll want to
> share
> +	   code with radeon_cs_get_ring in radeon_cs.c */
> +	if (args->ring != RADEON_CS_RING_GFX) {
> +		return -EINVAL;
> +	}
> +
> +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> +	if (gobj == NULL) {
> +		return -ENOENT;
> +	}
> +	robj = gem_to_radeon_bo(gobj);
> +
> +	if (gobj->size < args->offset) {
> +		r = -EINVAL;
> +		goto unreference;
> +	}
> +
> +	r = radeon_bo_reserve(robj, true);
> +	if (r) {
> +		goto unreference;
> +	}
> +
> +	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
> +	if (r) {
> +		goto unreserve;
> +	}
> +
> +	r = radeon_bo_kmap(robj, &buffer_data);
> +	if (r) {
> +		goto unpin;
> +	}
> +


Do you need to pin it? I think if you have it reserved and you are in here you shouldn't need to. (unless kmapping requires a pin)/


> +	radeon_irq_kms_sw_irq_get(rdev, ring);
> +
> +	fence_data = (uint32_t*)buffer_data;
> +	timeout =
> an
> + * interrupt and the value in the buffer might have changed.
> + */
> +struct drm_radeon_gem_wait_user_fence {
> +	uint32_t                handle;
> +	uint32_t                ring;
> +	uint64_t                offset;
> +	uint32_t                value;
> +	uint64_t                timeout_usec;

Align things here, 64 then 64 then 32 32 32 and a 32 pad.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH] drm/radeon: Add support for userspace fence waits
  2012-01-31 16:59 [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
@ 2012-01-31 17:56 ` Michel Dänzer
  2012-01-31 18:56   ` Jerome Glisse
  2012-01-31 18:55 ` David Airlie
  1 sibling, 1 reply; 18+ messages in thread
From: Michel Dänzer @ 2012-01-31 17:56 UTC (permalink / raw)
  To: Simon Farnsworth; +Cc: dri-devel

On Die, 2012-01-31 at 16:59 +0000, Simon Farnsworth wrote: 
> Userspace currently busywaits for fences to complete; on my workload, this
> busywait consumes 10% of the available CPU time.
> 
> Provide an ioctl so that userspace can wait for an EOP interrupt that
> corresponds to a previous EVENT_WRITE_EOP.
> 
> Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
> ---
> I've been working on top of Jerome's tiling patches, so this doesn't apply
> directly on top of current upstream kernels. I can easily rebase to another
> version upon request - just point me to a git tree.
> 
> My goal is to remove the sched_yield in Mesa's r600_fence_finish given up to
> date enough kernel; I hope, though, that the interface is clean enough for
> other users to extend it in the future (e.g. using compute rings).

I'm afraid not: Unless I'm missing something, userspace can't know which
ring the kernel submitted the CS to, and the kernel can't guess which
ring userspace needs to wait for.


-- 
Earthling Michel Dänzer           |                   http://www.amd.com
Libre software enthusiast         |          Debian, X and DRI developer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH] drm/radeon: Add support for userspace fence waits
@ 2012-01-31 16:59 Simon Farnsworth
  2012-01-31 17:56 ` Michel Dänzer
  2012-01-31 18:55 ` David Airlie
  0 siblings, 2 replies; 18+ messages in thread
From: Simon Farnsworth @ 2012-01-31 16:59 UTC (permalink / raw)
  To: dri-devel

Userspace currently busywaits for fences to complete; on my workload, this
busywait consumes 10% of the available CPU time.

Provide an ioctl so that userspace can wait for an EOP interrupt that
corresponds to a previous EVENT_WRITE_EOP.

Signed-off-by: Simon Farnsworth <simon.farnsworth@onelan.co.uk>
---
I've been working on top of Jerome's tiling patches, so this doesn't apply
directly on top of current upstream kernels. I can easily rebase to another
version upon request - just point me to a git tree.

My goal is to remove the sched_yield in Mesa's r600_fence_finish given up to
date enough kernel; I hope, though, that the interface is clean enough for
other users to extend it in the future (e.g. using compute rings).

 drivers/gpu/drm/radeon/radeon.h       |    3 +
 drivers/gpu/drm/radeon/radeon_drv.c   |    3 +-
 drivers/gpu/drm/radeon/radeon_fence.c |    2 +
 drivers/gpu/drm/radeon/radeon_gem.c   |   70 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_kms.c   |    1 +
 include/drm/radeon_drm.h              |   30 ++++++++++++++
 6 files changed, 108 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 2859406..00c187b 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -217,6 +217,7 @@ struct radeon_fence_driver {
 	unsigned long			last_jiffies;
 	unsigned long			last_timeout;
 	wait_queue_head_t		queue;
+	wait_queue_head_t		userspace_queue;
 	struct list_head		created;
 	struct list_head		emitted;
 	struct list_head		signaled;
@@ -1348,6 +1349,8 @@ int radeon_gem_set_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
 int radeon_gem_get_tiling_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp);
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+				     struct drm_file *filp);
 
 /* VRAM scratch page for HDP bug, default vram page */
 struct r600_vram_scratch {
diff --git a/drivers/gpu/drm/radeon/radeon_drv.c b/drivers/gpu/drm/radeon/radeon_drv.c
index 4ae2e1d..9f82fa9 100644
--- a/drivers/gpu/drm/radeon/radeon_drv.c
+++ b/drivers/gpu/drm/radeon/radeon_drv.c
@@ -56,9 +56,10 @@
  *   2.12.0 - RADEON_CS_KEEP_TILING_FLAGS
  *   2.13.0 - virtual memory support
  *   2.14.0 - add evergreen tiling informations
+ *   2.15.0 - gem_wait_user_fence ioctl
  */
 #define KMS_DRIVER_MAJOR	2
-#define KMS_DRIVER_MINOR	14
+#define KMS_DRIVER_MINOR	15
 #define KMS_DRIVER_PATCHLEVEL	0
 int radeon_driver_load_kms(struct drm_device *dev, unsigned long flags);
 int radeon_driver_unload_kms(struct drm_device *dev);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 64ea3dd..d86bc28 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -356,6 +356,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 	if (wake) {
 		wake_up_all(&rdev->fence_drv[ring].queue);
 	}
+	wake_up_interruptible_all(&rdev->fence_drv[ring].userspace_queue);
 }
 
 int radeon_fence_count_emitted(struct radeon_device *rdev, int ring)
@@ -421,6 +422,7 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
 	INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
 	init_waitqueue_head(&rdev->fence_drv[ring].queue);
+	init_waitqueue_head(&rdev->fence_drv[ring].userspace_queue);
 	rdev->fence_drv[ring].initialized = false;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index 7337850..602274f 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -531,3 +531,73 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 {
 	return drm_gem_handle_delete(file_priv, handle);
 }
+
+int radeon_gem_wait_user_fence_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *filp)
+{
+	struct drm_radeon_gem_wait_user_fence *args = data;
+	struct radeon_device *rdev = dev->dev_private;
+	struct drm_gem_object *gobj;
+	struct radeon_bo *robj;
+	void *buffer_data;
+	uint32_t *fence_data;
+	int r = 0;
+	long timeout;
+	int ring = RADEON_RING_TYPE_GFX_INDEX;
+
+	/* If you're implementing this for other rings, you'll want to share
+	   code with radeon_cs_get_ring in radeon_cs.c */
+	if (args->ring != RADEON_CS_RING_GFX) {
+		return -EINVAL;
+	}
+
+	gobj = drm_gem_object_lookup(dev, filp, args->handle);
+	if (gobj == NULL) {
+		return -ENOENT;
+	}
+	robj = gem_to_radeon_bo(gobj);
+
+	if (gobj->size < args->offset) {
+		r = -EINVAL;
+		goto unreference;
+	}
+
+	r = radeon_bo_reserve(robj, true);
+	if (r) {
+		goto unreference;
+	}
+
+	r = radeon_bo_pin(robj, RADEON_GEM_DOMAIN_GTT, NULL);
+	if (r) {
+		goto unreserve;
+	}
+
+	r = radeon_bo_kmap(robj, &buffer_data);
+	if (r) {
+		goto unpin;
+	}
+
+	radeon_irq_kms_sw_irq_get(rdev, ring);
+
+	fence_data = (uint32_t*)buffer_data;
+	timeout = wait_event_interruptible_timeout(rdev->fence_drv[ring].userspace_queue,
+						   fence_data[args->offset >> 2] != args->value,
+						   usecs_to_jiffies(args->timeout_usec));
+
+	radeon_irq_kms_sw_irq_put(rdev, ring);
+
+	if (timeout == 0)
+		r = -ETIMEDOUT;
+	else if (timeout < 0)
+		r = timeout;
+
+	radeon_bo_kunmap(robj);
+unpin:
+	radeon_bo_unpin(robj);
+unreserve:
+	radeon_bo_unreserve(robj);
+unreference:
+	drm_gem_object_unreference_unlocked(gobj);
+
+	return r;
+}
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index d335288..0e552cc 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -496,5 +496,6 @@ struct drm_ioctl_desc radeon_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_GET_TILING, radeon_gem_get_tiling_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_BUSY, radeon_gem_busy_ioctl, DRM_AUTH|DRM_UNLOCKED),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(RADEON_GEM_WAIT_USER_FENCE, radeon_gem_wait_user_fence_ioctl, DRM_AUTH|DRM_UNLOCKED),
 };
 int radeon_max_kms_ioctl = DRM_ARRAY_SIZE(radeon_ioctls_kms);
diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h
index dd2e9cf..3d4ae93 100644
--- a/include/drm/radeon_drm.h
+++ b/include/drm/radeon_drm.h
@@ -510,6 +510,7 @@ typedef struct {
 #define DRM_RADEON_GEM_GET_TILING	0x29
 #define DRM_RADEON_GEM_BUSY		0x2a
 #define DRM_RADEON_GEM_VA		0x2b
+#define DRM_RADEON_GEM_WAIT_USER_FENCE  0x2c
 
 #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
 #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
@@ -552,6 +553,7 @@ typedef struct {
 #define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
 #define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
 #define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
+#define DRM_IOCTL_RADEON_GEM_WAIT_USER_FENCE   DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_USER_FENCE, struct drm_radeon_gem_wait_user_fence)
 
 typedef struct drm_radeon_init {
 	enum {
@@ -967,4 +969,32 @@ struct drm_radeon_info {
 	uint64_t		value;
 };
 
+/**
+ * struct drm_radeon_gem_wait_user_fence - DRM_RADEON_GEM_WAIT_USER_FENCE ioctl param
+ *
+ * @handle: Handle for the object that the GPU is expected to write
+ * @ring: The ring on which the fence packet was issued
+ * @offset: Offset (in bytes) within that object where the GPU is expected
+ *          to write. Must be DWORD-aligned
+ * @value: The value expected if the GPU has not yet written to this location
+ * @timeout_usec: The maximum time to wait for the GPU, in microseconds
+ *
+ * The DRM_RADEON_GEM_WAIT_USER_FENCE ioctl is meant to allow userspace to
+ * avoid busy-waiting for a EVENT_WRITE_EOP packet to complete (e.g. for
+ * fence sync objects in OpenGL). It expects the EVENT_WRITE_EOP packet to
+ * have requested an interrupt on completion.
+ *
+ * The ioctl will return immediately if the value supplied is not the value
+ * found in the buffer at offset bytes in; otherwise, it will sleep for up
+ * to timeout_usec, waking up when an EVENT_WRITE_EOP packet causes an
+ * interrupt and the value in the buffer might have changed.
+ */
+struct drm_radeon_gem_wait_user_fence {
+	uint32_t                handle;
+	uint32_t                ring;
+	uint64_t                offset;
+	uint32_t                value;
+	uint64_t                timeout_usec;
+};
+
 #endif
-- 
1.7.6.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2012-02-01 18:21 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-31 13:16 r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Simon Farnsworth
2012-01-31 13:17 ` [PATCH] [r600g] Use new kernel interface to wait for fences Simon Farnsworth
2012-01-31 13:18 ` [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
2012-01-31 14:49   ` Simon Farnsworth
2012-01-31 15:38 ` r600g: Trying to remove the busywait for fence completion, but hitting inexplicable behaviour Alan Swanson
2012-01-31 15:54   ` Simon Farnsworth
2012-01-31 16:59 [PATCH] drm/radeon: Add support for userspace fence waits Simon Farnsworth
2012-01-31 17:56 ` Michel Dänzer
2012-01-31 18:56   ` Jerome Glisse
2012-01-31 21:08     ` Marek Olšák
2012-02-01  8:39       ` Michel Dänzer
2012-02-01 13:58         ` Marek Olšák
2012-01-31 18:55 ` David Airlie
2012-01-31 19:07   ` Jerome Glisse
2012-01-31 19:13     ` Alex Deucher
2012-01-31 19:36       ` Jerome Glisse
2012-02-01 11:31     ` Simon Farnsworth
2012-02-01 18:21       ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.