All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] drm/radeon: userfence IOCTL
@ 2015-04-13 14:52 Christian König
  2015-04-13 14:52 ` [PATCH 1/3] drm/radeon: wait for BO move to finish on kmap Christian König
                   ` (4 more replies)
  0 siblings, 5 replies; 23+ messages in thread
From: Christian König @ 2015-04-13 14:52 UTC (permalink / raw)
  To: dri-devel; +Cc: Serguei.Sagalovitch

Hello everyone,

we have a requirement for a bit different kind of fence handling. Currently we handle fences completely inside the kernel, but in the future we would like to emit multiple fences inside the same IB as well.

This works by adding multiple fence commands into an IB which just write their value to a specific location inside a BO and trigger the appropriate hardware interrupt.

The user part of the driver stack should then be able to call an IOCTL to wait for the interrupt and block for the value (or something larger) to be written to the specific location.

This has the advantage that you can have multiple synchronization points in the same IB and don't need to split up your draw commands over several IBs so that the kernel can insert kernel fences in between.

The following set of patches tries to implement exactly this IOCTL. The big problem with that IOCTL is that TTM needs the BO to be kept in the same place while it is mapped inside the kernel page table. So this requires that we pin down the BO for the duration of the wait IOCTL.

This practically gives userspace a way of pinning down BOs for as long as it wants, without the ability for the kernel for intervention.

Any ideas how to avoid those problems? Or better ideas how to handle the new requirements?

Please note that the patches are only hacked together quick&dirty to demonstrate the problem and not very well tested.

Best regards,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/3] drm/radeon: wait for BO move to finish on kmap
  2015-04-13 14:52 [RFC] drm/radeon: userfence IOCTL Christian König
@ 2015-04-13 14:52 ` Christian König
  2015-04-13 14:52 ` [PATCH 2/3] drm/radeon: cleanup radeon_cs_get_ring Christian König
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2015-04-13 14:52 UTC (permalink / raw)
  To: dri-devel; +Cc: Serguei.Sagalovitch

From: Christian König <christian.koenig@amd.com>

After resume buffer need to move back into VRAM. We already had this
problem with UVD, but solve it more general to be on the safe side.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h        |  2 ++
 drivers/gpu/drm/radeon/radeon_object.c | 11 +++++++++++
 drivers/gpu/drm/radeon/radeon_ttm.c    |  3 +++
 drivers/gpu/drm/radeon/radeon_uvd.c    | 10 ----------
 4 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 33d5a4f..6e6b49a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -506,6 +506,8 @@ struct radeon_bo {
 
 	struct radeon_mn		*mn;
 	struct interval_tree_node	mn_it;
+
+	struct radeon_fence		*last_move;
 };
 #define gem_to_radeon_bo(gobj) container_of((gobj), struct radeon_bo, gem_base)
 
diff --git a/drivers/gpu/drm/radeon/radeon_object.c b/drivers/gpu/drm/radeon/radeon_object.c
index 318165d..14a0f87 100644
--- a/drivers/gpu/drm/radeon/radeon_object.c
+++ b/drivers/gpu/drm/radeon/radeon_object.c
@@ -263,6 +263,17 @@ int radeon_bo_kmap(struct radeon_bo *bo, void **ptr)
 	bool is_iomem;
 	int r;
 
+	if (bo->last_move) {
+		r = radeon_fence_wait(bo->last_move, false);
+		if (r) {
+			radeon_bo_kunmap(bo);
+			DRM_ERROR("Failed waiting BO move (%d)!\n", r);
+			return r;
+		}
+
+		radeon_fence_unref(&bo->last_move);
+	}
+
 	if (bo->kptr) {
 		if (ptr) {
 			*ptr = bo->kptr;
diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index b292aca..1fa4f2d 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -253,6 +253,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
 			struct ttm_mem_reg *new_mem,
 			struct ttm_mem_reg *old_mem)
 {
+	struct radeon_bo *rbo = container_of(bo, struct radeon_bo, tbo);
 	struct radeon_device *rdev;
 	uint64_t old_start, new_start;
 	struct radeon_fence *fence;
@@ -300,6 +301,8 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
 
 	r = ttm_bo_move_accel_cleanup(bo, &fence->base,
 				      evict, no_wait_gpu, new_mem);
+	radeon_fence_unref(&rbo->last_move);
+	rbo->last_move = radeon_fence_ref(fence);
 	radeon_fence_unref(&fence);
 	return r;
 }
diff --git a/drivers/gpu/drm/radeon/radeon_uvd.c b/drivers/gpu/drm/radeon/radeon_uvd.c
index c10b2ae..60f96a3 100644
--- a/drivers/gpu/drm/radeon/radeon_uvd.c
+++ b/drivers/gpu/drm/radeon/radeon_uvd.c
@@ -401,7 +401,6 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 {
 	int32_t *msg, msg_type, handle;
 	unsigned img_size = 0;
-	struct fence *f;
 	void *ptr;
 
 	int i, r;
@@ -411,15 +410,6 @@ static int radeon_uvd_cs_msg(struct radeon_cs_parser *p, struct radeon_bo *bo,
 		return -EINVAL;
 	}
 
-	f = reservation_object_get_excl(bo->tbo.resv);
-	if (f) {
-		r = radeon_fence_wait((struct radeon_fence *)f, false);
-		if (r) {
-			DRM_ERROR("Failed waiting for UVD message (%d)!\n", r);
-			return r;
-		}
-	}
-
 	r = radeon_bo_kmap(bo, &ptr);
 	if (r) {
 		DRM_ERROR("Failed mapping the UVD message (%d)!\n", r);
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/3] drm/radeon: cleanup radeon_cs_get_ring
  2015-04-13 14:52 [RFC] drm/radeon: userfence IOCTL Christian König
  2015-04-13 14:52 ` [PATCH 1/3] drm/radeon: wait for BO move to finish on kmap Christian König
@ 2015-04-13 14:52 ` Christian König
  2015-04-13 14:52 ` [PATCH 3/3] drm/radeon: add userfence IOCTL Christian König
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 23+ messages in thread
From: Christian König @ 2015-04-13 14:52 UTC (permalink / raw)
  To: dri-devel; +Cc: Serguei.Sagalovitch

From: Christian König <christian.koenig@amd.com>

That makes it possible to call it from elsewhere as well.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h    |  2 +-
 drivers/gpu/drm/radeon/radeon_cs.c | 55 ++++++++++++++++++++------------------
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6e6b49a..57d63c4 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -1093,7 +1093,6 @@ struct radeon_cs_parser {
 	int			parser_error;
 	u32			cs_flags;
 	u32			ring;
-	s32			priority;
 	struct ww_acquire_ctx	ticket;
 };
 
@@ -1122,6 +1121,7 @@ typedef int (*radeon_packet0_check_t)(struct radeon_cs_parser *p,
 typedef int (*radeon_packet3_check_t)(struct radeon_cs_parser *p,
 				      struct radeon_cs_packet *pkt);
 
+int radeon_cs_get_ring(struct radeon_device *rdev, u32 ring, s32 priority);
 
 /*
  * AGP
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 4d0f96c..4b92762 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -187,47 +187,46 @@ static int radeon_cs_parser_relocs(struct radeon_cs_parser *p)
 	return r;
 }
 
-static int radeon_cs_get_ring(struct radeon_cs_parser *p, u32 ring, s32 priority)
+int radeon_cs_get_ring(struct radeon_device *rdev, u32 ring, s32 priority)
 {
-	p->priority = priority;
-
 	switch (ring) {
 	default:
 		DRM_ERROR("unknown ring id: %d\n", ring);
-		return -EINVAL;
-	case RADEON_CS_RING_GFX:
-		p->ring = RADEON_RING_TYPE_GFX_INDEX;
 		break;
+
+	case RADEON_CS_RING_GFX:
+		return RADEON_RING_TYPE_GFX_INDEX;
+
 	case RADEON_CS_RING_COMPUTE:
-		if (p->rdev->family >= CHIP_TAHITI) {
-			if (p->priority > 0)
-				p->ring = CAYMAN_RING_TYPE_CP1_INDEX;
+		if (rdev->family >= CHIP_TAHITI) {
+			if (priority > 0)
+				return CAYMAN_RING_TYPE_CP1_INDEX;
 			else
-				p->ring = CAYMAN_RING_TYPE_CP2_INDEX;
+				return CAYMAN_RING_TYPE_CP2_INDEX;
 		} else
-			p->ring = RADEON_RING_TYPE_GFX_INDEX;
-		break;
+			return RADEON_RING_TYPE_GFX_INDEX;
+
 	case RADEON_CS_RING_DMA:
-		if (p->rdev->family >= CHIP_CAYMAN) {
-			if (p->priority > 0)
-				p->ring = R600_RING_TYPE_DMA_INDEX;
+		if (rdev->family >= CHIP_CAYMAN) {
+			if (priority > 0)
+				return R600_RING_TYPE_DMA_INDEX;
 			else
-				p->ring = CAYMAN_RING_TYPE_DMA1_INDEX;
-		} else if (p->rdev->family >= CHIP_RV770) {
-			p->ring = R600_RING_TYPE_DMA_INDEX;
+				return CAYMAN_RING_TYPE_DMA1_INDEX;
+		} else if (rdev->family >= CHIP_RV770) {
+			return R600_RING_TYPE_DMA_INDEX;
 		} else {
 			return -EINVAL;
 		}
-		break;
+
 	case RADEON_CS_RING_UVD:
-		p->ring = R600_RING_TYPE_UVD_INDEX;
-		break;
+		return R600_RING_TYPE_UVD_INDEX;
+
 	case RADEON_CS_RING_VCE:
 		/* TODO: only use the low priority ring for now */
-		p->ring = TN_RING_TYPE_VCE1_INDEX;
-		break;
+		return TN_RING_TYPE_VCE1_INDEX;
+
 	}
-	return 0;
+	return -EINVAL;
 }
 
 static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
@@ -348,14 +347,18 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 
 	/* these are KMS only */
 	if (p->rdev) {
+		int r;
+
 		if ((p->cs_flags & RADEON_CS_USE_VM) &&
 		    !p->rdev->vm_manager.enabled) {
 			DRM_ERROR("VM not active on asic!\n");
 			return -EINVAL;
 		}
 
-		if (radeon_cs_get_ring(p, ring, priority))
-			return -EINVAL;
+		r = radeon_cs_get_ring(p->rdev, ring, priority);
+		if (r < 0)
+			return r;
+		p->ring = r;
 
 		/* we only support VM on some SI+ rings */
 		if ((p->cs_flags & RADEON_CS_USE_VM) == 0) {
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/3] drm/radeon: add userfence IOCTL
  2015-04-13 14:52 [RFC] drm/radeon: userfence IOCTL Christian König
  2015-04-13 14:52 ` [PATCH 1/3] drm/radeon: wait for BO move to finish on kmap Christian König
  2015-04-13 14:52 ` [PATCH 2/3] drm/radeon: cleanup radeon_cs_get_ring Christian König
@ 2015-04-13 14:52 ` Christian König
  2015-04-13 17:23   ` Daniel Vetter
  2015-04-13 17:27   ` Daniel Vetter
  2015-04-13 15:25 ` [RFC] drm/radeon: " Serguei Sagalovitch
  2015-04-13 15:31 ` Jerome Glisse
  4 siblings, 2 replies; 23+ messages in thread
From: Christian König @ 2015-04-13 14:52 UTC (permalink / raw)
  To: dri-devel; +Cc: Serguei.Sagalovitch

From: Christian König <christian.koenig@amd.com>

WIP patch which adds an user fence IOCTL.

Signed-off-by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon.h     |  2 ++
 drivers/gpu/drm/radeon/radeon_gem.c | 71 +++++++++++++++++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_kms.c |  1 +
 include/uapi/drm/radeon_drm.h       | 41 +++++++++++++--------
 4 files changed, 100 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 57d63c4..110baae 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -2215,6 +2215,8 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *filp);
 int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
 			     struct drm_file *filp);
+int radeon_gem_wait_userfence_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *filp);
 int radeon_gem_pin_ioctl(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
 int radeon_gem_unpin_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
index ac3c131..094b3d5 100644
--- a/drivers/gpu/drm/radeon/radeon_gem.c
+++ b/drivers/gpu/drm/radeon/radeon_gem.c
@@ -365,6 +365,77 @@ handle_lockup:
 	return r;
 }
 
+int radeon_gem_wait_userfence_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *filp)
+{
+	struct radeon_device *rdev = dev->dev_private;
+	struct drm_radeon_gem_wait_userfence *args = data;
+	struct drm_gem_object *gobj;
+	struct radeon_bo *robj;
+	uint64_t *fence_addr;
+	void *cpu_addr;
+	int r, ring;
+
+	down_read(&rdev->exclusive_lock);
+
+	ring = radeon_cs_get_ring(rdev, args->ring, args->priority);
+	if (ring < 0)
+		return ring;
+
+	gobj = drm_gem_object_lookup(dev, filp, args->handle);
+	if (gobj == NULL) {
+		r = -ENOENT;
+		goto error_unref;
+	}
+	robj = gem_to_radeon_bo(gobj);
+
+	if (args->offset & 0x7 || args->offset + 8 > radeon_bo_size(robj)) {
+		r = -EINVAL;
+		goto error_unref;
+	}
+
+	r = radeon_bo_reserve(robj, false);
+        if (r)
+		goto error_unref;
+
+        r = radeon_bo_pin(rdev->uvd.vcpu_bo, RADEON_GEM_DOMAIN_VRAM |
+		RADEON_GEM_DOMAIN_GTT, NULL);
+	if (r)
+		goto error_unreserve;
+
+        r = radeon_bo_kmap(robj, &cpu_addr);
+        if (r)
+		goto error_unpin;
+
+        radeon_bo_unreserve(robj);
+
+	fence_addr = (uint64_t *)(((uint8_t*)cpu_addr) + args->offset);
+
+	radeon_irq_kms_sw_irq_get(rdev, ring);
+	r = wait_event_interruptible(rdev->fence_queue, (
+		*fence_addr >= args->fence || rdev->needs_reset
+	));
+	radeon_irq_kms_sw_irq_put(rdev, ring);
+
+	if (rdev->needs_reset)
+		r = -EDEADLK;
+
+	radeon_bo_reserve(robj, false);
+	radeon_bo_kunmap(robj);
+
+error_unpin:
+	radeon_bo_unpin(robj);
+
+error_unreserve:
+        radeon_bo_unreserve(robj);
+
+error_unref:
+	drm_gem_object_unreference_unlocked(gobj);
+	up_read(&rdev->exclusive_lock);
+	r = radeon_gem_handle_lockup(robj->rdev, r);
+	return r;
+}
+
 int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 				struct drm_file *filp)
 {
diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
index 686411e..4757f25 100644
--- a/drivers/gpu/drm/radeon/radeon_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_kms.c
@@ -893,5 +893,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(RADEON_GEM_USERPTR, radeon_gem_userptr_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(RADEON_GEM_WAIT_USERFENCE, radeon_gem_wait_userfence_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };
 int radeon_max_kms_ioctl = ARRAY_SIZE(radeon_ioctls_kms);
diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
index 50d0fb4..43fe660 100644
--- a/include/uapi/drm/radeon_drm.h
+++ b/include/uapi/drm/radeon_drm.h
@@ -512,6 +512,7 @@ typedef struct {
 #define DRM_RADEON_GEM_VA		0x2b
 #define DRM_RADEON_GEM_OP		0x2c
 #define DRM_RADEON_GEM_USERPTR		0x2d
+#define DRM_RADEON_GEM_WAIT_USERFENCE	0x2e
 
 #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
 #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
@@ -541,21 +542,22 @@ typedef struct {
 #define DRM_IOCTL_RADEON_SURF_ALLOC DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_SURF_ALLOC, drm_radeon_surface_alloc_t)
 #define DRM_IOCTL_RADEON_SURF_FREE  DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_SURF_FREE, drm_radeon_surface_free_t)
 /* KMS */
-#define DRM_IOCTL_RADEON_GEM_INFO	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_INFO, struct drm_radeon_gem_info)
-#define DRM_IOCTL_RADEON_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_CREATE, struct drm_radeon_gem_create)
-#define DRM_IOCTL_RADEON_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_MMAP, struct drm_radeon_gem_mmap)
-#define DRM_IOCTL_RADEON_GEM_PREAD	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PREAD, struct drm_radeon_gem_pread)
-#define DRM_IOCTL_RADEON_GEM_PWRITE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PWRITE, struct drm_radeon_gem_pwrite)
-#define DRM_IOCTL_RADEON_GEM_SET_DOMAIN	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_DOMAIN, struct drm_radeon_gem_set_domain)
-#define DRM_IOCTL_RADEON_GEM_WAIT_IDLE	DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_IDLE, struct drm_radeon_gem_wait_idle)
-#define DRM_IOCTL_RADEON_CS		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_CS, struct drm_radeon_cs)
-#define DRM_IOCTL_RADEON_INFO		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_INFO, struct drm_radeon_info)
-#define DRM_IOCTL_RADEON_GEM_SET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_TILING, struct drm_radeon_gem_set_tiling)
-#define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
-#define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
-#define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
-#define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
-#define DRM_IOCTL_RADEON_GEM_USERPTR	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_USERPTR, struct drm_radeon_gem_userptr)
+#define DRM_IOCTL_RADEON_GEM_INFO		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_INFO, struct drm_radeon_gem_info)
+#define DRM_IOCTL_RADEON_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_CREATE, struct drm_radeon_gem_create)
+#define DRM_IOCTL_RADEON_GEM_MMAP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_MMAP, struct drm_radeon_gem_mmap)
+#define DRM_IOCTL_RADEON_GEM_PREAD		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PREAD, struct drm_radeon_gem_pread)
+#define DRM_IOCTL_RADEON_GEM_PWRITE		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PWRITE, struct drm_radeon_gem_pwrite)
+#define DRM_IOCTL_RADEON_GEM_SET_DOMAIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_DOMAIN, struct drm_radeon_gem_set_domain)
+#define DRM_IOCTL_RADEON_GEM_WAIT_IDLE		DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_IDLE, struct drm_radeon_gem_wait_idle)
+#define DRM_IOCTL_RADEON_CS			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_CS, struct drm_radeon_cs)
+#define DRM_IOCTL_RADEON_INFO			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_INFO, struct drm_radeon_info)
+#define DRM_IOCTL_RADEON_GEM_SET_TILING		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_TILING, struct drm_radeon_gem_set_tiling)
+#define DRM_IOCTL_RADEON_GEM_GET_TILING		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
+#define DRM_IOCTL_RADEON_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
+#define DRM_IOCTL_RADEON_GEM_VA			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
+#define DRM_IOCTL_RADEON_GEM_OP			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
+#define DRM_IOCTL_RADEON_GEM_USERPTR		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_USERPTR, struct drm_radeon_gem_userptr)
+#define DRM_IOCTL_RADEON_GEM_WAIT_USERFENCE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_USERFENCE, struct drm_radeon_gem_wait_userfence)
 
 typedef struct drm_radeon_init {
 	enum {
@@ -831,6 +833,15 @@ struct drm_radeon_gem_userptr {
 	uint32_t		handle;
 };
 
+struct drm_radeon_gem_wait_userfence {
+	uint32_t	handle;
+	uint32_t	offset;
+	uint64_t	fence;
+	uint32_t	ring;
+	int32_t		priority;
+	uint32_t	flags;
+};
+
 #define RADEON_TILING_MACRO				0x1
 #define RADEON_TILING_MICRO				0x2
 #define RADEON_TILING_SWAP_16BIT			0x4
-- 
1.9.1

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 14:52 [RFC] drm/radeon: userfence IOCTL Christian König
                   ` (2 preceding siblings ...)
  2015-04-13 14:52 ` [PATCH 3/3] drm/radeon: add userfence IOCTL Christian König
@ 2015-04-13 15:25 ` Serguei Sagalovitch
  2015-04-13 15:35   ` Christian König
  2015-04-13 15:39   ` Jerome Glisse
  2015-04-13 15:31 ` Jerome Glisse
  4 siblings, 2 replies; 23+ messages in thread
From: Serguei Sagalovitch @ 2015-04-13 15:25 UTC (permalink / raw)
  To: Christian König, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2315 bytes --]

 > the BO to be kept in the same place while it is mapped inside the 
kernel page table
...
 > So this requires that we pin down the BO for the duration of the wait 
IOCTL.

But my understanding is that it should be not duration of "wait" IOCTL 
but "duration" of command buffer execution.

BTW: I would assume that this is not the new scenario.

  This is scenario:
     - User allocate BO
     - User get CPU address for BO
     - User submit command buffer to write to BO
     - User could "poll" / "read" or "write" BO data by CPU

So when  TTM needs  to move BO to another location it should also update 
CPU "mapping" correctly so user will always read / write the correct data.

Did I miss anything?


Sincerely yours,
Serguei Sagalovitch

On 15-04-13 10:52 AM, Christian König wrote:
> Hello everyone,
>
> we have a requirement for a bit different kind of fence handling. Currently we handle fences completely inside the kernel, but in the future we would like to emit multiple fences inside the same IB as well.
>
> This works by adding multiple fence commands into an IB which just write their value to a specific location inside a BO and trigger the appropriate hardware interrupt.
>
> The user part of the driver stack should then be able to call an IOCTL to wait for the interrupt and block for the value (or something larger) to be written to the specific location.
>
> This has the advantage that you can have multiple synchronization points in the same IB and don't need to split up your draw commands over several IBs so that the kernel can insert kernel fences in between.
>
> The following set of patches tries to implement exactly this IOCTL. The big problem with that IOCTL is that TTM needs the BO to be kept in the same place while it is mapped inside the kernel page table. So this requires that we pin down the BO for the duration of the wait IOCTL.
>
> This practically gives userspace a way of pinning down BOs for as long as it wants, without the ability for the kernel for intervention.
>
> Any ideas how to avoid those problems? Or better ideas how to handle the new requirements?
>
> Please note that the patches are only hacked together quick&dirty to demonstrate the problem and not very well tested.
>
> Best regards,
> Christian.


[-- Attachment #1.2: Type: text/html, Size: 2895 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 14:52 [RFC] drm/radeon: userfence IOCTL Christian König
                   ` (3 preceding siblings ...)
  2015-04-13 15:25 ` [RFC] drm/radeon: " Serguei Sagalovitch
@ 2015-04-13 15:31 ` Jerome Glisse
  2015-04-13 15:45   ` Christian König
  4 siblings, 1 reply; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 15:31 UTC (permalink / raw)
  To: Christian König; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 04:52:14PM +0200, Christian König wrote:
> Hello everyone,
> 
> we have a requirement for a bit different kind of fence handling.
> Currently we handle fences completely inside the kernel, but in
> the future we would like to emit multiple fences inside the same
> IB as well.
> 
> This works by adding multiple fence commands into an IB which
> just write their value to a specific location inside a BO and
> trigger the appropriate hardware interrupt.
> 
> The user part of the driver stack should then be able to call an
> IOCTL to wait for the interrupt and block for the value (or
> something larger) to be written to the specific location.
> 
> This has the advantage that you can have multiple synchronization
> points in the same IB and don't need to split up your draw commands
> over several IBs so that the kernel can insert kernel fences in
> between.
> 
> The following set of patches tries to implement exactly this IOCTL.
> The big problem with that IOCTL is that TTM needs the BO to be
> kept in the same place while it is mapped inside the kernel page
> table. So this requires that we pin down the BO for the duration
> of the wait IOCTL.
> 
> This practically gives userspace a way of pinning down BOs for as
> long as it wants, without the ability for the kernel for intervention.
> 
> Any ideas how to avoid those problems? Or better ideas how to handle
> the new requirements?

So i think the simplest solution is to only allow such "fence" bo to
be inside system memory (no vram for them). My assumption here is that
such BO will barely see more than couple dword write so it is not a
bandwidth intensive BO. Or do you have a requirement for such BO to
be in VRAM ?

Now to do that i would just add a property to buffer object that
effectively forbid such BO to be place anywhere else than GTT. Doing
that would make the ioctl code simpler, just check the BO as the
GTT only property set and if not return -EINVAL. Then its a simple
matter of kmapping the proper page.

Note that the only thing that would be left to forbid is the swaping
of the buffer due to memory pressure (from various ttm/core shrinker).

Cheers,
Jérôme
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:25 ` [RFC] drm/radeon: " Serguei Sagalovitch
@ 2015-04-13 15:35   ` Christian König
  2015-04-13 15:37     ` Serguei Sagalovitch
  2015-04-13 15:46     ` Jerome Glisse
  2015-04-13 15:39   ` Jerome Glisse
  1 sibling, 2 replies; 23+ messages in thread
From: Christian König @ 2015-04-13 15:35 UTC (permalink / raw)
  To: Serguei Sagalovitch, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3245 bytes --]

On 13.04.2015 17:25, Serguei Sagalovitch wrote:
> > the BO to be kept in the same place while it is mapped inside the 
> kernel page table
> ...
> > So this requires that we pin down the BO for the duration of the 
> wait IOCTL.
>
> But my understanding is that it should be not duration of "wait" IOCTL 
> but "duration" of command buffer execution.
>
> BTW: I would assume that this is not the new scenario.
>
>  This is scenario:
>     - User allocate BO
>     - User get CPU address for BO
>     - User submit command buffer to write to BO
>     - User could "poll" / "read" or "write" BO data by CPU
>
> So when  TTM needs  to move BO to another location it should also 
> update CPU "mapping" correctly so user will always read / write the 
> correct data.
>
> Did I miss anything?

The problem is that kernel mappings are not updated when TTM moves the 
buffer around. In the case of a swapped out buffer that wouldn't even be 
possible cause kernel mappings aren't pageable.

You just can't map the BO into kernel space unless you have it pinned 
down, so you can't check the current value written in the BO in your IOCTL.

One alternative is to send all interrupts in question unfiltered to user 
space and let userspace do the check if the right value was written or 
not. But I assume that this would be rather bad for performance.

Another alternative would be to use the userspace mapping to check the 
BO value, but this approach isn't compatible with a GPU scheduler. E.g. 
you can't really do cross process space memory access in device drivers.

Regards,
Christian.

>
>
> Sincerely yours,
> Serguei Sagalovitch
>
> On 15-04-13 10:52 AM, Christian König wrote:
>> Hello everyone,
>>
>> we have a requirement for a bit different kind of fence handling. Currently we handle fences completely inside the kernel, but in the future we would like to emit multiple fences inside the same IB as well.
>>
>> This works by adding multiple fence commands into an IB which just write their value to a specific location inside a BO and trigger the appropriate hardware interrupt.
>>
>> The user part of the driver stack should then be able to call an IOCTL to wait for the interrupt and block for the value (or something larger) to be written to the specific location.
>>
>> This has the advantage that you can have multiple synchronization points in the same IB and don't need to split up your draw commands over several IBs so that the kernel can insert kernel fences in between.
>>
>> The following set of patches tries to implement exactly this IOCTL. The big problem with that IOCTL is that TTM needs the BO to be kept in the same place while it is mapped inside the kernel page table. So this requires that we pin down the BO for the duration of the wait IOCTL.
>>
>> This practically gives userspace a way of pinning down BOs for as long as it wants, without the ability for the kernel for intervention.
>>
>> Any ideas how to avoid those problems? Or better ideas how to handle the new requirements?
>>
>> Please note that the patches are only hacked together quick&dirty to demonstrate the problem and not very well tested.
>>
>> Best regards,
>> Christian.
>


[-- Attachment #1.2: Type: text/html, Size: 4257 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:35   ` Christian König
@ 2015-04-13 15:37     ` Serguei Sagalovitch
  2015-04-13 15:46     ` Jerome Glisse
  1 sibling, 0 replies; 23+ messages in thread
From: Serguei Sagalovitch @ 2015-04-13 15:37 UTC (permalink / raw)
  To: Christian König, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 3550 bytes --]

 >  Another alternative would be to use the userspace mapping to check 
the BO value
This is what I was thinking.

Sincerely yours,
Serguei Sagalovitch

On 15-04-13 11:35 AM, Christian König wrote:
> On 13.04.2015 17:25, Serguei Sagalovitch wrote:
>> > the BO to be kept in the same place while it is mapped inside the 
>> kernel page table
>> ...
>> > So this requires that we pin down the BO for the duration of the 
>> wait IOCTL.
>>
>> But my understanding is that it should be not duration of "wait" 
>> IOCTL but "duration" of command buffer execution.
>>
>> BTW: I would assume that this is not the new scenario.
>>
>>  This is scenario:
>>     - User allocate BO
>>     - User get CPU address for BO
>>     - User submit command buffer to write to BO
>>     - User could "poll" / "read" or "write" BO data by CPU
>>
>> So when  TTM needs  to move BO to another location it should also 
>> update CPU "mapping" correctly so user will always read / write the 
>> correct data.
>>
>> Did I miss anything?
>
> The problem is that kernel mappings are not updated when TTM moves the 
> buffer around. In the case of a swapped out buffer that wouldn't even 
> be possible cause kernel mappings aren't pageable.
>
> You just can't map the BO into kernel space unless you have it pinned 
> down, so you can't check the current value written in the BO in your 
> IOCTL.
>
> One alternative is to send all interrupts in question unfiltered to 
> user space and let userspace do the check if the right value was 
> written or not. But I assume that this would be rather bad for 
> performance.
>
> Another alternative would be to use the userspace mapping to check the 
> BO value, but this approach isn't compatible with a GPU scheduler. 
> E.g. you can't really do cross process space memory access in device 
> drivers.
>
> Regards,
> Christian.
>
>>
>>
>> Sincerely yours,
>> Serguei Sagalovitch
>>
>> On 15-04-13 10:52 AM, Christian König wrote:
>>> Hello everyone,
>>>
>>> we have a requirement for a bit different kind of fence handling. Currently we handle fences completely inside the kernel, but in the future we would like to emit multiple fences inside the same IB as well.
>>>
>>> This works by adding multiple fence commands into an IB which just write their value to a specific location inside a BO and trigger the appropriate hardware interrupt.
>>>
>>> The user part of the driver stack should then be able to call an IOCTL to wait for the interrupt and block for the value (or something larger) to be written to the specific location.
>>>
>>> This has the advantage that you can have multiple synchronization points in the same IB and don't need to split up your draw commands over several IBs so that the kernel can insert kernel fences in between.
>>>
>>> The following set of patches tries to implement exactly this IOCTL. The big problem with that IOCTL is that TTM needs the BO to be kept in the same place while it is mapped inside the kernel page table. So this requires that we pin down the BO for the duration of the wait IOCTL.
>>>
>>> This practically gives userspace a way of pinning down BOs for as long as it wants, without the ability for the kernel for intervention.
>>>
>>> Any ideas how to avoid those problems? Or better ideas how to handle the new requirements?
>>>
>>> Please note that the patches are only hacked together quick&dirty to demonstrate the problem and not very well tested.
>>>
>>> Best regards,
>>> Christian.
>>
>


[-- Attachment #1.2: Type: text/html, Size: 4806 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:25 ` [RFC] drm/radeon: " Serguei Sagalovitch
  2015-04-13 15:35   ` Christian König
@ 2015-04-13 15:39   ` Jerome Glisse
  2015-04-13 15:47     ` Christian König
  2015-04-13 15:52     ` Serguei Sagalovitch
  1 sibling, 2 replies; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 15:39 UTC (permalink / raw)
  To: Serguei Sagalovitch; +Cc: dri-devel

On Mon, Apr 13, 2015 at 11:25:30AM -0400, Serguei Sagalovitch wrote:
> > the BO to be kept in the same place while it is mapped inside the kernel
> page table
> ...
> > So this requires that we pin down the BO for the duration of the wait
> IOCTL.
> 
> But my understanding is that it should be not duration of "wait" IOCTL but
> "duration" of command buffer execution.
> 
> BTW: I would assume that this is not the new scenario.
> 
>  This is scenario:
>     - User allocate BO
>     - User get CPU address for BO
>     - User submit command buffer to write to BO
>     - User could "poll" / "read" or "write" BO data by CPU
> 
> So when  TTM needs  to move BO to another location it should also update CPU
> "mapping" correctly so user will always read / write the correct data.
> 
> Did I miss anything?

No this is how things works. But we want to avoid pinning buffer.
One use case for this userspace fence is i assume same BO same
user fence use accross several command buffer. Given that the
userspace wait fence ioctl has not way to know which command
buffer it is really waiting after then kernel has no knowledge
of if this user fence will signal at all. So a malicious user
space (we always have to assume this thing exist) could create
a big VRAM BO and effectively pin it in VRAM leading to a GPU
DOS (denial of service).

By the way Christian, i would add a timeout to this ioctl and
return eagain to userspace on timeout so that userspace can
resumit. That way a malicious userspace will just keep exhausting
its cpu timeslot.

Cheers,
Jérôme
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:31 ` Jerome Glisse
@ 2015-04-13 15:45   ` Christian König
  2015-04-13 16:08     ` Jerome Glisse
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2015-04-13 15:45 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Serguei.Sagalovitch, dri-devel

On 13.04.2015 17:31, Jerome Glisse wrote:
> On Mon, Apr 13, 2015 at 04:52:14PM +0200, Christian König wrote:
>> Hello everyone,
>>
>> we have a requirement for a bit different kind of fence handling.
>> Currently we handle fences completely inside the kernel, but in
>> the future we would like to emit multiple fences inside the same
>> IB as well.
>>
>> This works by adding multiple fence commands into an IB which
>> just write their value to a specific location inside a BO and
>> trigger the appropriate hardware interrupt.
>>
>> The user part of the driver stack should then be able to call an
>> IOCTL to wait for the interrupt and block for the value (or
>> something larger) to be written to the specific location.
>>
>> This has the advantage that you can have multiple synchronization
>> points in the same IB and don't need to split up your draw commands
>> over several IBs so that the kernel can insert kernel fences in
>> between.
>>
>> The following set of patches tries to implement exactly this IOCTL.
>> The big problem with that IOCTL is that TTM needs the BO to be
>> kept in the same place while it is mapped inside the kernel page
>> table. So this requires that we pin down the BO for the duration
>> of the wait IOCTL.
>>
>> This practically gives userspace a way of pinning down BOs for as
>> long as it wants, without the ability for the kernel for intervention.
>>
>> Any ideas how to avoid those problems? Or better ideas how to handle
>> the new requirements?
> So i think the simplest solution is to only allow such "fence" bo to
> be inside system memory (no vram for them). My assumption here is that
> such BO will barely see more than couple dword write so it is not a
> bandwidth intensive BO. Or do you have a requirement for such BO to
> be in VRAM ?

Not that I know off.

> Now to do that i would just add a property to buffer object that
> effectively forbid such BO to be place anywhere else than GTT. Doing
> that would make the ioctl code simpler, just check the BO as the
> GTT only property set and if not return -EINVAL. Then its a simple
> matter of kmapping the proper page.

I've also considered adding an internal flag that when a buffer is 
kmapped we avoid moving it to VRAM / swapping it out, but see below.

> Note that the only thing that would be left to forbid is the swaping
> of the buffer due to memory pressure (from various ttm/core shrinker).

Yeah, how the heck would I do that? That's internals of TTM that I never 
got into.

Thanks for the ideas,
Christian.

>
> Cheers,
> Jérôme

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:35   ` Christian König
  2015-04-13 15:37     ` Serguei Sagalovitch
@ 2015-04-13 15:46     ` Jerome Glisse
  1 sibling, 0 replies; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 15:46 UTC (permalink / raw)
  To: Christian König; +Cc: Serguei Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 05:35:04PM +0200, Christian König wrote:
> On 13.04.2015 17:25, Serguei Sagalovitch wrote:
> >> the BO to be kept in the same place while it is mapped inside the kernel
> >page table
> >...
> >> So this requires that we pin down the BO for the duration of the wait
> >IOCTL.
> >
> >But my understanding is that it should be not duration of "wait" IOCTL but
> >"duration" of command buffer execution.
> >
> >BTW: I would assume that this is not the new scenario.
> >
> > This is scenario:
> >    - User allocate BO
> >    - User get CPU address for BO
> >    - User submit command buffer to write to BO
> >    - User could "poll" / "read" or "write" BO data by CPU
> >
> >So when  TTM needs  to move BO to another location it should also update
> >CPU "mapping" correctly so user will always read / write the correct data.
> >
> >Did I miss anything?
> 
> The problem is that kernel mappings are not updated when TTM moves the
> buffer around. In the case of a swapped out buffer that wouldn't even be
> possible cause kernel mappings aren't pageable.
> 
> You just can't map the BO into kernel space unless you have it pinned down,
> so you can't check the current value written in the BO in your IOCTL.
> 
> One alternative is to send all interrupts in question unfiltered to user
> space and let userspace do the check if the right value was written or not.
> But I assume that this would be rather bad for performance.

Yeah this most likey would be seriously bad. It might even allow malicous
userspace to force irq storm.

> 
> Another alternative would be to use the userspace mapping to check the BO
> value, but this approach isn't compatible with a GPU scheduler. E.g. you
> can't really do cross process space memory access in device drivers.

Not to mention that you would need mmu_notifier to protect you from munmap.
I think the solution i proposed in the other mail is simplest and safest.

Cheers,
Jérôme

> 
> Regards,
> Christian.
> 
> >
> >
> >Sincerely yours,
> >Serguei Sagalovitch
> >
> >On 15-04-13 10:52 AM, Christian König wrote:
> >>Hello everyone,
> >>
> >>we have a requirement for a bit different kind of fence handling. Currently we handle fences completely inside the kernel, but in the future we would like to emit multiple fences inside the same IB as well.
> >>
> >>This works by adding multiple fence commands into an IB which just write their value to a specific location inside a BO and trigger the appropriate hardware interrupt.
> >>
> >>The user part of the driver stack should then be able to call an IOCTL to wait for the interrupt and block for the value (or something larger) to be written to the specific location.
> >>
> >>This has the advantage that you can have multiple synchronization points in the same IB and don't need to split up your draw commands over several IBs so that the kernel can insert kernel fences in between.
> >>
> >>The following set of patches tries to implement exactly this IOCTL. The big problem with that IOCTL is that TTM needs the BO to be kept in the same place while it is mapped inside the kernel page table. So this requires that we pin down the BO for the duration of the wait IOCTL.
> >>
> >>This practically gives userspace a way of pinning down BOs for as long as it wants, without the ability for the kernel for intervention.
> >>
> >>Any ideas how to avoid those problems? Or better ideas how to handle the new requirements?
> >>
> >>Please note that the patches are only hacked together quick&dirty to demonstrate the problem and not very well tested.
> >>
> >>Best regards,
> >>Christian.
> >
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:39   ` Jerome Glisse
@ 2015-04-13 15:47     ` Christian König
  2015-04-13 15:52     ` Serguei Sagalovitch
  1 sibling, 0 replies; 23+ messages in thread
From: Christian König @ 2015-04-13 15:47 UTC (permalink / raw)
  To: Jerome Glisse, Serguei Sagalovitch; +Cc: dri-devel

On 13.04.2015 17:39, Jerome Glisse wrote:
> On Mon, Apr 13, 2015 at 11:25:30AM -0400, Serguei Sagalovitch wrote:
>>> the BO to be kept in the same place while it is mapped inside the kernel
>> page table
>> ...
>>> So this requires that we pin down the BO for the duration of the wait
>> IOCTL.
>>
>> But my understanding is that it should be not duration of "wait" IOCTL but
>> "duration" of command buffer execution.
>>
>> BTW: I would assume that this is not the new scenario.
>>
>>   This is scenario:
>>      - User allocate BO
>>      - User get CPU address for BO
>>      - User submit command buffer to write to BO
>>      - User could "poll" / "read" or "write" BO data by CPU
>>
>> So when  TTM needs  to move BO to another location it should also update CPU
>> "mapping" correctly so user will always read / write the correct data.
>>
>> Did I miss anything?
> No this is how things works. But we want to avoid pinning buffer.
> One use case for this userspace fence is i assume same BO same
> user fence use accross several command buffer. Given that the
> userspace wait fence ioctl has not way to know which command
> buffer it is really waiting after then kernel has no knowledge
> of if this user fence will signal at all. So a malicious user
> space (we always have to assume this thing exist) could create
> a big VRAM BO and effectively pin it in VRAM leading to a GPU
> DOS (denial of service).
>
> By the way Christian, i would add a timeout to this ioctl and
> return eagain to userspace on timeout so that userspace can
> resumit. That way a malicious userspace will just keep exhausting
> its cpu timeslot.

Yeah, I knew. I honestly haven't even tested the implementation, just 
first wanted to check how far of the whole idea is.

On the one hand it is rather appealing, but on the other it's a complete 
different approach of what we have done so far. E.g. we can pretty much 
forget the whole kernel fence framework with that.

Regards,
Christian.

>
> Cheers,
> Jérôme

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:39   ` Jerome Glisse
  2015-04-13 15:47     ` Christian König
@ 2015-04-13 15:52     ` Serguei Sagalovitch
  2015-04-13 16:13       ` Jerome Glisse
  1 sibling, 1 reply; 23+ messages in thread
From: Serguei Sagalovitch @ 2015-04-13 15:52 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 2546 bytes --]

 > Given that the userspace wait fence ioctl has not way to know which 
command buffer it is really waiting after then kernel has no knowledge 
of if this user fence will signal at all.
We could pass BO handle as  parameter in the "fence ioctl" to rely on it 
so kernel will know which BO  it is waiting.


 > So a malicious user space (we always have to assume this thing exist) 
could create a big VRAM BO and effectively pin it in VRAM leading to a 
GPU DOS (denial of service).
This problem  always exists. Malicious user space could allocate big BO 
and then submit shader running in loop which read/write from this BO.  
It could also spawn processes which will do the same thing. IMHO the 
only way to improve situation is  to have GPU Scheduler to allow 
"unloading" such application.

Sincerely yours,
Serguei Sagalovitch

On 15-04-13 11:39 AM, Jerome Glisse wrote:
> On Mon, Apr 13, 2015 at 11:25:30AM -0400, Serguei Sagalovitch wrote:
>>> the BO to be kept in the same place while it is mapped inside the kernel
>> page table
>> ...
>>> So this requires that we pin down the BO for the duration of the wait
>> IOCTL.
>>
>> But my understanding is that it should be not duration of "wait" IOCTL but
>> "duration" of command buffer execution.
>>
>> BTW: I would assume that this is not the new scenario.
>>
>>   This is scenario:
>>      - User allocate BO
>>      - User get CPU address for BO
>>      - User submit command buffer to write to BO
>>      - User could "poll" / "read" or "write" BO data by CPU
>>
>> So when  TTM needs  to move BO to another location it should also update CPU
>> "mapping" correctly so user will always read / write the correct data.
>>
>> Did I miss anything?
> No this is how things works. But we want to avoid pinning buffer.
> One use case for this userspace fence is i assume same BO same
> user fence use accross several command buffer. Given that the
> userspace wait fence ioctl has not way to know which command
> buffer it is really waiting after then kernel has no knowledge
> of if this user fence will signal at all. So a malicious user
> space (we always have to assume this thing exist) could create
> a big VRAM BO and effectively pin it in VRAM leading to a GPU
> DOS (denial of service).
>
> By the way Christian, i would add a timeout to this ioctl and
> return eagain to userspace on timeout so that userspace can
> resumit. That way a malicious userspace will just keep exhausting
> its cpu timeslot.
>
> Cheers,
> Jérôme


[-- Attachment #1.2: Type: text/html, Size: 3259 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:45   ` Christian König
@ 2015-04-13 16:08     ` Jerome Glisse
  2015-04-13 16:55       ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 16:08 UTC (permalink / raw)
  To: Christian König; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 05:45:21PM +0200, Christian König wrote:
> On 13.04.2015 17:31, Jerome Glisse wrote:
> >On Mon, Apr 13, 2015 at 04:52:14PM +0200, Christian König wrote:
> >>Hello everyone,
> >>
> >>we have a requirement for a bit different kind of fence handling.
> >>Currently we handle fences completely inside the kernel, but in
> >>the future we would like to emit multiple fences inside the same
> >>IB as well.
> >>
> >>This works by adding multiple fence commands into an IB which
> >>just write their value to a specific location inside a BO and
> >>trigger the appropriate hardware interrupt.
> >>
> >>The user part of the driver stack should then be able to call an
> >>IOCTL to wait for the interrupt and block for the value (or
> >>something larger) to be written to the specific location.
> >>
> >>This has the advantage that you can have multiple synchronization
> >>points in the same IB and don't need to split up your draw commands
> >>over several IBs so that the kernel can insert kernel fences in
> >>between.
> >>
> >>The following set of patches tries to implement exactly this IOCTL.
> >>The big problem with that IOCTL is that TTM needs the BO to be
> >>kept in the same place while it is mapped inside the kernel page
> >>table. So this requires that we pin down the BO for the duration
> >>of the wait IOCTL.
> >>
> >>This practically gives userspace a way of pinning down BOs for as
> >>long as it wants, without the ability for the kernel for intervention.
> >>
> >>Any ideas how to avoid those problems? Or better ideas how to handle
> >>the new requirements?
> >So i think the simplest solution is to only allow such "fence" bo to
> >be inside system memory (no vram for them). My assumption here is that
> >such BO will barely see more than couple dword write so it is not a
> >bandwidth intensive BO. Or do you have a requirement for such BO to
> >be in VRAM ?
> 
> Not that I know off.
> 
> >Now to do that i would just add a property to buffer object that
> >effectively forbid such BO to be place anywhere else than GTT. Doing
> >that would make the ioctl code simpler, just check the BO as the
> >GTT only property set and if not return -EINVAL. Then its a simple
> >matter of kmapping the proper page.
> 
> I've also considered adding an internal flag that when a buffer is kmapped
> we avoid moving it to VRAM / swapping it out, but see below.
> 
> >Note that the only thing that would be left to forbid is the swaping
> >of the buffer due to memory pressure (from various ttm/core shrinker).
> 
> Yeah, how the heck would I do that? That's internals of TTM that I never got
> into.

Actualy i think it is easier then i first thought, in the wait ioctl
check if the buffer has a pending fence ie gpu is still using it, if
not return -EAGAIN because it means that it is pointless to wait for
next GPU interrupt.

For as long as the BO has an active fence it will not be swapped out
(see ttm_bo_swapout()). So in the wait event test both the value and
the pending fence. If the pending fence signal but not the value then
return -EAGAIN. In all case just keep a kmap of the page (do not kmap
the using existing kmap helper we would need something new to not
interfer with the shrinker). Not that after testing the value you would
need to check that the BO was not move and thus the page you were
looking into is still the one the BO is using.

That way userspace can not abuse this ioctl to block the shrinker from
making progress.

I need to look at ttm kmap code to see if it is actually useable without
disrupting the shrinker. Will do that after lunch.

Cheers,
Jérôme

> 
> Thanks for the ideas,
> Christian.
> 
> >
> >Cheers,
> >Jérôme
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 15:52     ` Serguei Sagalovitch
@ 2015-04-13 16:13       ` Jerome Glisse
  0 siblings, 0 replies; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 16:13 UTC (permalink / raw)
  To: Serguei Sagalovitch; +Cc: dri-devel

On Mon, Apr 13, 2015 at 11:52:19AM -0400, Serguei Sagalovitch wrote:
> > Given that the userspace wait fence ioctl has not way to know which
> > command buffer it is really waiting after then kernel has no knowledge
> > if this user fence will signal at all.
>
> We could pass BO handle as  parameter in the "fence ioctl" to rely on it so
> kernel will know which BO  it is waiting.

Christian code already do that, but this is not enough to know which cs
kernel is waiting for. See my other emails with Christian.

> 
> 
> > So a malicious user space (we always have to assume this thing exist)
> > could create a big VRAM BO and effectively pin it in VRAM leading to a GPU
> > DOS (denial of service).
>
> This problem  always exists. Malicious user space could allocate big BO and
> then submit shader running in loop which read/write from this BO.  It could
> also spawn processes which will do the same thing. IMHO the only way to
> improve situation is  to have GPU Scheduler to allow "unloading" such
> application.
>

Yes but the GPU lockup watchdog would kick in (thinking it is a GPU lockup)
and reset the GPU which is harsh but that is what we have now (well i think
the compute guys messed with that so it might no longer be the case).

So it is just about avoiding giving userspace more means to mess with thing.

Cheers,
Jérôme
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 16:08     ` Jerome Glisse
@ 2015-04-13 16:55       ` Christian König
  2015-04-13 17:46         ` Jerome Glisse
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2015-04-13 16:55 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Serguei.Sagalovitch, dri-devel

On 13.04.2015 18:08, Jerome Glisse wrote:
> On Mon, Apr 13, 2015 at 05:45:21PM +0200, Christian König wrote:
>> On 13.04.2015 17:31, Jerome Glisse wrote:
>>> On Mon, Apr 13, 2015 at 04:52:14PM +0200, Christian König wrote:
>>>> Hello everyone,
>>>>
>>>> we have a requirement for a bit different kind of fence handling.
>>>> Currently we handle fences completely inside the kernel, but in
>>>> the future we would like to emit multiple fences inside the same
>>>> IB as well.
>>>>
>>>> This works by adding multiple fence commands into an IB which
>>>> just write their value to a specific location inside a BO and
>>>> trigger the appropriate hardware interrupt.
>>>>
>>>> The user part of the driver stack should then be able to call an
>>>> IOCTL to wait for the interrupt and block for the value (or
>>>> something larger) to be written to the specific location.
>>>>
>>>> This has the advantage that you can have multiple synchronization
>>>> points in the same IB and don't need to split up your draw commands
>>>> over several IBs so that the kernel can insert kernel fences in
>>>> between.
>>>>
>>>> The following set of patches tries to implement exactly this IOCTL.
>>>> The big problem with that IOCTL is that TTM needs the BO to be
>>>> kept in the same place while it is mapped inside the kernel page
>>>> table. So this requires that we pin down the BO for the duration
>>>> of the wait IOCTL.
>>>>
>>>> This practically gives userspace a way of pinning down BOs for as
>>>> long as it wants, without the ability for the kernel for intervention.
>>>>
>>>> Any ideas how to avoid those problems? Or better ideas how to handle
>>>> the new requirements?
>>> So i think the simplest solution is to only allow such "fence" bo to
>>> be inside system memory (no vram for them). My assumption here is that
>>> such BO will barely see more than couple dword write so it is not a
>>> bandwidth intensive BO. Or do you have a requirement for such BO to
>>> be in VRAM ?
>> Not that I know off.
>>
>>> Now to do that i would just add a property to buffer object that
>>> effectively forbid such BO to be place anywhere else than GTT. Doing
>>> that would make the ioctl code simpler, just check the BO as the
>>> GTT only property set and if not return -EINVAL. Then its a simple
>>> matter of kmapping the proper page.
>> I've also considered adding an internal flag that when a buffer is kmapped
>> we avoid moving it to VRAM / swapping it out, but see below.
>>
>>> Note that the only thing that would be left to forbid is the swaping
>>> of the buffer due to memory pressure (from various ttm/core shrinker).
>> Yeah, how the heck would I do that? That's internals of TTM that I never got
>> into.
> Actualy i think it is easier then i first thought, in the wait ioctl
> check if the buffer has a pending fence ie gpu is still using it, if
> not return -EAGAIN because it means that it is pointless to wait for
> next GPU interrupt.
>
> For as long as the BO has an active fence it will not be swapped out
> (see ttm_bo_swapout()). So in the wait event test both the value and
> the pending fence. If the pending fence signal but not the value then
> return -EAGAIN. In all case just keep a kmap of the page (do not kmap
> the using existing kmap helper we would need something new to not
> interfer with the shrinker). Not that after testing the value you would
> need to check that the BO was not move and thus the page you were
> looking into is still the one the BO is using.
>
> That way userspace can not abuse this ioctl to block the shrinker from
> making progress.

So what we do on the start of the IOCTL is to check the BOs fences and 
see if it actually is still used and note it's current placement.

Then map it so the kernel can access it and in the waiting loop we check 
if it still has a fence and is still in the same place.

If there isn't any fences left or the placement has changed we simply 
assume that the fence is signaled.

Yeah, that actually should work.

Thanks for the tip,
Christian.

>
> I need to look at ttm kmap code to see if it is actually useable without
> disrupting the shrinker. Will do that after lunch.
>
> Cheers,
> Jérôme
>
>> Thanks for the ideas,
>> Christian.
>>
>>> Cheers,
>>> Jérôme

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/radeon: add userfence IOCTL
  2015-04-13 14:52 ` [PATCH 3/3] drm/radeon: add userfence IOCTL Christian König
@ 2015-04-13 17:23   ` Daniel Vetter
  2015-04-13 17:51     ` Jerome Glisse
  2015-04-13 17:27   ` Daniel Vetter
  1 sibling, 1 reply; 23+ messages in thread
From: Daniel Vetter @ 2015-04-13 17:23 UTC (permalink / raw)
  To: Christian König; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 04:52:17PM +0200, Christian König wrote:
> From: Christian König <christian.koenig@amd.com>
> 
> WIP patch which adds an user fence IOCTL.
>
> Signed-off-by: Christian König <christian.koenig@amd.com>

I've discussed userspace fences a lot with Jerome last XDC, so here's my
comments:

My primary concern with mid-batch fences is that if we create real kernel
fences (which might even escape to other places using android syncpts or
dma-buf) then we end up relying upon correct userspace to not hang the
kernel, which isn't good.

So imo any kind of mid-batch fence must be done completely in userspace
and never show up as a fence object on the kernel side. I thought that
just busy-spinning in userspace would be all that's needed, but adding an
ioctl to wait on such user fences seems like a nice idea too. On i915 we
even have 2 interrupt sources per ring, so we could split the irq
processing between kernel fences and userspace fences.

One thing to keep in mind (I dunno radeon/ttm internals enough to know) is
to make sure that while being blocked for a userspace fence in the ioctl
you're not starving anyone else. But it doesn't look like you're holding
any reservation objects or something similar which might prevent
concurrent cs.

Anyway if there's nothing more to this then I think this is very sane
idea.

Cheers, Daniel
> ---
>  drivers/gpu/drm/radeon/radeon.h     |  2 ++
>  drivers/gpu/drm/radeon/radeon_gem.c | 71 +++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/radeon/radeon_kms.c |  1 +
>  include/uapi/drm/radeon_drm.h       | 41 +++++++++++++--------
>  4 files changed, 100 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 57d63c4..110baae 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -2215,6 +2215,8 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
>  			    struct drm_file *filp);
>  int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
>  			     struct drm_file *filp);
> +int radeon_gem_wait_userfence_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *filp);
>  int radeon_gem_pin_ioctl(struct drm_device *dev, void *data,
>  			 struct drm_file *file_priv);
>  int radeon_gem_unpin_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> index ac3c131..094b3d5 100644
> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> @@ -365,6 +365,77 @@ handle_lockup:
>  	return r;
>  }
>  
> +int radeon_gem_wait_userfence_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *filp)
> +{
> +	struct radeon_device *rdev = dev->dev_private;
> +	struct drm_radeon_gem_wait_userfence *args = data;
> +	struct drm_gem_object *gobj;
> +	struct radeon_bo *robj;
> +	uint64_t *fence_addr;
> +	void *cpu_addr;
> +	int r, ring;
> +
> +	down_read(&rdev->exclusive_lock);
> +
> +	ring = radeon_cs_get_ring(rdev, args->ring, args->priority);
> +	if (ring < 0)
> +		return ring;
> +
> +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> +	if (gobj == NULL) {
> +		r = -ENOENT;
> +		goto error_unref;
> +	}
> +	robj = gem_to_radeon_bo(gobj);
> +
> +	if (args->offset & 0x7 || args->offset + 8 > radeon_bo_size(robj)) {
> +		r = -EINVAL;
> +		goto error_unref;
> +	}
> +
> +	r = radeon_bo_reserve(robj, false);
> +        if (r)
> +		goto error_unref;
> +
> +        r = radeon_bo_pin(rdev->uvd.vcpu_bo, RADEON_GEM_DOMAIN_VRAM |
> +		RADEON_GEM_DOMAIN_GTT, NULL);
> +	if (r)
> +		goto error_unreserve;
> +
> +        r = radeon_bo_kmap(robj, &cpu_addr);
> +        if (r)
> +		goto error_unpin;
> +
> +        radeon_bo_unreserve(robj);
> +
> +	fence_addr = (uint64_t *)(((uint8_t*)cpu_addr) + args->offset);
> +
> +	radeon_irq_kms_sw_irq_get(rdev, ring);
> +	r = wait_event_interruptible(rdev->fence_queue, (
> +		*fence_addr >= args->fence || rdev->needs_reset
> +	));
> +	radeon_irq_kms_sw_irq_put(rdev, ring);
> +
> +	if (rdev->needs_reset)
> +		r = -EDEADLK;
> +
> +	radeon_bo_reserve(robj, false);
> +	radeon_bo_kunmap(robj);
> +
> +error_unpin:
> +	radeon_bo_unpin(robj);
> +
> +error_unreserve:
> +        radeon_bo_unreserve(robj);
> +
> +error_unref:
> +	drm_gem_object_unreference_unlocked(gobj);
> +	up_read(&rdev->exclusive_lock);
> +	r = radeon_gem_handle_lockup(robj->rdev, r);
> +	return r;
> +}
> +
>  int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>  				struct drm_file *filp)
>  {
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 686411e..4757f25 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -893,5 +893,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_USERPTR, radeon_gem_userptr_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(RADEON_GEM_WAIT_USERFENCE, radeon_gem_wait_userfence_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  };
>  int radeon_max_kms_ioctl = ARRAY_SIZE(radeon_ioctls_kms);
> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> index 50d0fb4..43fe660 100644
> --- a/include/uapi/drm/radeon_drm.h
> +++ b/include/uapi/drm/radeon_drm.h
> @@ -512,6 +512,7 @@ typedef struct {
>  #define DRM_RADEON_GEM_VA		0x2b
>  #define DRM_RADEON_GEM_OP		0x2c
>  #define DRM_RADEON_GEM_USERPTR		0x2d
> +#define DRM_RADEON_GEM_WAIT_USERFENCE	0x2e
>  
>  #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
>  #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
> @@ -541,21 +542,22 @@ typedef struct {
>  #define DRM_IOCTL_RADEON_SURF_ALLOC DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_SURF_ALLOC, drm_radeon_surface_alloc_t)
>  #define DRM_IOCTL_RADEON_SURF_FREE  DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_SURF_FREE, drm_radeon_surface_free_t)
>  /* KMS */
> -#define DRM_IOCTL_RADEON_GEM_INFO	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_INFO, struct drm_radeon_gem_info)
> -#define DRM_IOCTL_RADEON_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_CREATE, struct drm_radeon_gem_create)
> -#define DRM_IOCTL_RADEON_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_MMAP, struct drm_radeon_gem_mmap)
> -#define DRM_IOCTL_RADEON_GEM_PREAD	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PREAD, struct drm_radeon_gem_pread)
> -#define DRM_IOCTL_RADEON_GEM_PWRITE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PWRITE, struct drm_radeon_gem_pwrite)
> -#define DRM_IOCTL_RADEON_GEM_SET_DOMAIN	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_DOMAIN, struct drm_radeon_gem_set_domain)
> -#define DRM_IOCTL_RADEON_GEM_WAIT_IDLE	DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_IDLE, struct drm_radeon_gem_wait_idle)
> -#define DRM_IOCTL_RADEON_CS		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_CS, struct drm_radeon_cs)
> -#define DRM_IOCTL_RADEON_INFO		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_INFO, struct drm_radeon_info)
> -#define DRM_IOCTL_RADEON_GEM_SET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_TILING, struct drm_radeon_gem_set_tiling)
> -#define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
> -#define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
> -#define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
> -#define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
> -#define DRM_IOCTL_RADEON_GEM_USERPTR	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_USERPTR, struct drm_radeon_gem_userptr)
> +#define DRM_IOCTL_RADEON_GEM_INFO		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_INFO, struct drm_radeon_gem_info)
> +#define DRM_IOCTL_RADEON_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_CREATE, struct drm_radeon_gem_create)
> +#define DRM_IOCTL_RADEON_GEM_MMAP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_MMAP, struct drm_radeon_gem_mmap)
> +#define DRM_IOCTL_RADEON_GEM_PREAD		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PREAD, struct drm_radeon_gem_pread)
> +#define DRM_IOCTL_RADEON_GEM_PWRITE		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PWRITE, struct drm_radeon_gem_pwrite)
> +#define DRM_IOCTL_RADEON_GEM_SET_DOMAIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_DOMAIN, struct drm_radeon_gem_set_domain)
> +#define DRM_IOCTL_RADEON_GEM_WAIT_IDLE		DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_IDLE, struct drm_radeon_gem_wait_idle)
> +#define DRM_IOCTL_RADEON_CS			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_CS, struct drm_radeon_cs)
> +#define DRM_IOCTL_RADEON_INFO			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_INFO, struct drm_radeon_info)
> +#define DRM_IOCTL_RADEON_GEM_SET_TILING		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_TILING, struct drm_radeon_gem_set_tiling)
> +#define DRM_IOCTL_RADEON_GEM_GET_TILING		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
> +#define DRM_IOCTL_RADEON_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
> +#define DRM_IOCTL_RADEON_GEM_VA			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
> +#define DRM_IOCTL_RADEON_GEM_OP			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
> +#define DRM_IOCTL_RADEON_GEM_USERPTR		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_USERPTR, struct drm_radeon_gem_userptr)
> +#define DRM_IOCTL_RADEON_GEM_WAIT_USERFENCE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_USERFENCE, struct drm_radeon_gem_wait_userfence)
>  
>  typedef struct drm_radeon_init {
>  	enum {
> @@ -831,6 +833,15 @@ struct drm_radeon_gem_userptr {
>  	uint32_t		handle;
>  };
>  
> +struct drm_radeon_gem_wait_userfence {
> +	uint32_t	handle;
> +	uint32_t	offset;
> +	uint64_t	fence;
> +	uint32_t	ring;
> +	int32_t		priority;
> +	uint32_t	flags;
> +};
> +
>  #define RADEON_TILING_MACRO				0x1
>  #define RADEON_TILING_MICRO				0x2
>  #define RADEON_TILING_SWAP_16BIT			0x4
> -- 
> 1.9.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/radeon: add userfence IOCTL
  2015-04-13 14:52 ` [PATCH 3/3] drm/radeon: add userfence IOCTL Christian König
  2015-04-13 17:23   ` Daniel Vetter
@ 2015-04-13 17:27   ` Daniel Vetter
  1 sibling, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2015-04-13 17:27 UTC (permalink / raw)
  To: Christian König; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 04:52:17PM +0200, Christian König wrote:
> From: Christian König <christian.koenig@amd.com>
> 
> WIP patch which adds an user fence IOCTL.
> 
> Signed-off-by: Christian König <christian.koenig@amd.com>
> ---
>  drivers/gpu/drm/radeon/radeon.h     |  2 ++
>  drivers/gpu/drm/radeon/radeon_gem.c | 71 +++++++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/radeon/radeon_kms.c |  1 +
>  include/uapi/drm/radeon_drm.h       | 41 +++++++++++++--------
>  4 files changed, 100 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 57d63c4..110baae 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -2215,6 +2215,8 @@ int radeon_gem_create_ioctl(struct drm_device *dev, void *data,
>  			    struct drm_file *filp);
>  int radeon_gem_userptr_ioctl(struct drm_device *dev, void *data,
>  			     struct drm_file *filp);
> +int radeon_gem_wait_userfence_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *filp);
>  int radeon_gem_pin_ioctl(struct drm_device *dev, void *data,
>  			 struct drm_file *file_priv);
>  int radeon_gem_unpin_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/radeon/radeon_gem.c b/drivers/gpu/drm/radeon/radeon_gem.c
> index ac3c131..094b3d5 100644
> --- a/drivers/gpu/drm/radeon/radeon_gem.c
> +++ b/drivers/gpu/drm/radeon/radeon_gem.c
> @@ -365,6 +365,77 @@ handle_lockup:
>  	return r;
>  }
>  
> +int radeon_gem_wait_userfence_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *filp)
> +{
> +	struct radeon_device *rdev = dev->dev_private;
> +	struct drm_radeon_gem_wait_userfence *args = data;
> +	struct drm_gem_object *gobj;
> +	struct radeon_bo *robj;
> +	uint64_t *fence_addr;
> +	void *cpu_addr;
> +	int r, ring;
> +
> +	down_read(&rdev->exclusive_lock);
> +
> +	ring = radeon_cs_get_ring(rdev, args->ring, args->priority);
> +	if (ring < 0)
> +		return ring;
> +
> +	gobj = drm_gem_object_lookup(dev, filp, args->handle);
> +	if (gobj == NULL) {
> +		r = -ENOENT;
> +		goto error_unref;
> +	}
> +	robj = gem_to_radeon_bo(gobj);
> +
> +	if (args->offset & 0x7 || args->offset + 8 > radeon_bo_size(robj)) {
> +		r = -EINVAL;
> +		goto error_unref;
> +	}
> +
> +	r = radeon_bo_reserve(robj, false);
> +        if (r)
> +		goto error_unref;
> +
> +        r = radeon_bo_pin(rdev->uvd.vcpu_bo, RADEON_GEM_DOMAIN_VRAM |
> +		RADEON_GEM_DOMAIN_GTT, NULL);

One thing on top of making sure that you don't hold any locks while
waiting to ensure you can't abuse this: I think you need to add a check
here that the bo is reasonably size, otherwise someone will abuse this to
lock in massive amounts of ram. As long as the limit is reasonable small
we can subsume the overhead under the per-thread limits imo.
-Daniel

> +	if (r)
> +		goto error_unreserve;
> +
> +        r = radeon_bo_kmap(robj, &cpu_addr);
> +        if (r)
> +		goto error_unpin;
> +
> +        radeon_bo_unreserve(robj);
> +
> +	fence_addr = (uint64_t *)(((uint8_t*)cpu_addr) + args->offset);
> +
> +	radeon_irq_kms_sw_irq_get(rdev, ring);
> +	r = wait_event_interruptible(rdev->fence_queue, (
> +		*fence_addr >= args->fence || rdev->needs_reset
> +	));
> +	radeon_irq_kms_sw_irq_put(rdev, ring);
> +
> +	if (rdev->needs_reset)
> +		r = -EDEADLK;
> +
> +	radeon_bo_reserve(robj, false);
> +	radeon_bo_kunmap(robj);
> +
> +error_unpin:
> +	radeon_bo_unpin(robj);
> +
> +error_unreserve:
> +        radeon_bo_unreserve(robj);
> +
> +error_unref:
> +	drm_gem_object_unreference_unlocked(gobj);
> +	up_read(&rdev->exclusive_lock);
> +	r = radeon_gem_handle_lockup(robj->rdev, r);
> +	return r;
> +}
> +
>  int radeon_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>  				struct drm_file *filp)
>  {
> diff --git a/drivers/gpu/drm/radeon/radeon_kms.c b/drivers/gpu/drm/radeon/radeon_kms.c
> index 686411e..4757f25 100644
> --- a/drivers/gpu/drm/radeon/radeon_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_kms.c
> @@ -893,5 +893,6 @@ const struct drm_ioctl_desc radeon_ioctls_kms[] = {
>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_VA, radeon_gem_va_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_OP, radeon_gem_op_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(RADEON_GEM_USERPTR, radeon_gem_userptr_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(RADEON_GEM_WAIT_USERFENCE, radeon_gem_wait_userfence_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
>  };
>  int radeon_max_kms_ioctl = ARRAY_SIZE(radeon_ioctls_kms);
> diff --git a/include/uapi/drm/radeon_drm.h b/include/uapi/drm/radeon_drm.h
> index 50d0fb4..43fe660 100644
> --- a/include/uapi/drm/radeon_drm.h
> +++ b/include/uapi/drm/radeon_drm.h
> @@ -512,6 +512,7 @@ typedef struct {
>  #define DRM_RADEON_GEM_VA		0x2b
>  #define DRM_RADEON_GEM_OP		0x2c
>  #define DRM_RADEON_GEM_USERPTR		0x2d
> +#define DRM_RADEON_GEM_WAIT_USERFENCE	0x2e
>  
>  #define DRM_IOCTL_RADEON_CP_INIT    DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_CP_INIT, drm_radeon_init_t)
>  #define DRM_IOCTL_RADEON_CP_START   DRM_IO(  DRM_COMMAND_BASE + DRM_RADEON_CP_START)
> @@ -541,21 +542,22 @@ typedef struct {
>  #define DRM_IOCTL_RADEON_SURF_ALLOC DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_SURF_ALLOC, drm_radeon_surface_alloc_t)
>  #define DRM_IOCTL_RADEON_SURF_FREE  DRM_IOW( DRM_COMMAND_BASE + DRM_RADEON_SURF_FREE, drm_radeon_surface_free_t)
>  /* KMS */
> -#define DRM_IOCTL_RADEON_GEM_INFO	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_INFO, struct drm_radeon_gem_info)
> -#define DRM_IOCTL_RADEON_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_CREATE, struct drm_radeon_gem_create)
> -#define DRM_IOCTL_RADEON_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_MMAP, struct drm_radeon_gem_mmap)
> -#define DRM_IOCTL_RADEON_GEM_PREAD	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PREAD, struct drm_radeon_gem_pread)
> -#define DRM_IOCTL_RADEON_GEM_PWRITE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PWRITE, struct drm_radeon_gem_pwrite)
> -#define DRM_IOCTL_RADEON_GEM_SET_DOMAIN	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_DOMAIN, struct drm_radeon_gem_set_domain)
> -#define DRM_IOCTL_RADEON_GEM_WAIT_IDLE	DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_IDLE, struct drm_radeon_gem_wait_idle)
> -#define DRM_IOCTL_RADEON_CS		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_CS, struct drm_radeon_cs)
> -#define DRM_IOCTL_RADEON_INFO		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_INFO, struct drm_radeon_info)
> -#define DRM_IOCTL_RADEON_GEM_SET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_TILING, struct drm_radeon_gem_set_tiling)
> -#define DRM_IOCTL_RADEON_GEM_GET_TILING	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
> -#define DRM_IOCTL_RADEON_GEM_BUSY	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
> -#define DRM_IOCTL_RADEON_GEM_VA		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
> -#define DRM_IOCTL_RADEON_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
> -#define DRM_IOCTL_RADEON_GEM_USERPTR	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_USERPTR, struct drm_radeon_gem_userptr)
> +#define DRM_IOCTL_RADEON_GEM_INFO		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_INFO, struct drm_radeon_gem_info)
> +#define DRM_IOCTL_RADEON_GEM_CREATE		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_CREATE, struct drm_radeon_gem_create)
> +#define DRM_IOCTL_RADEON_GEM_MMAP		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_MMAP, struct drm_radeon_gem_mmap)
> +#define DRM_IOCTL_RADEON_GEM_PREAD		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PREAD, struct drm_radeon_gem_pread)
> +#define DRM_IOCTL_RADEON_GEM_PWRITE		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_PWRITE, struct drm_radeon_gem_pwrite)
> +#define DRM_IOCTL_RADEON_GEM_SET_DOMAIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_DOMAIN, struct drm_radeon_gem_set_domain)
> +#define DRM_IOCTL_RADEON_GEM_WAIT_IDLE		DRM_IOW(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_IDLE, struct drm_radeon_gem_wait_idle)
> +#define DRM_IOCTL_RADEON_CS			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_CS, struct drm_radeon_cs)
> +#define DRM_IOCTL_RADEON_INFO			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_INFO, struct drm_radeon_info)
> +#define DRM_IOCTL_RADEON_GEM_SET_TILING		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_SET_TILING, struct drm_radeon_gem_set_tiling)
> +#define DRM_IOCTL_RADEON_GEM_GET_TILING		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_GET_TILING, struct drm_radeon_gem_get_tiling)
> +#define DRM_IOCTL_RADEON_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_BUSY, struct drm_radeon_gem_busy)
> +#define DRM_IOCTL_RADEON_GEM_VA			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_VA, struct drm_radeon_gem_va)
> +#define DRM_IOCTL_RADEON_GEM_OP			DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_OP, struct drm_radeon_gem_op)
> +#define DRM_IOCTL_RADEON_GEM_USERPTR		DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_USERPTR, struct drm_radeon_gem_userptr)
> +#define DRM_IOCTL_RADEON_GEM_WAIT_USERFENCE	DRM_IOWR(DRM_COMMAND_BASE + DRM_RADEON_GEM_WAIT_USERFENCE, struct drm_radeon_gem_wait_userfence)
>  
>  typedef struct drm_radeon_init {
>  	enum {
> @@ -831,6 +833,15 @@ struct drm_radeon_gem_userptr {
>  	uint32_t		handle;
>  };
>  
> +struct drm_radeon_gem_wait_userfence {
> +	uint32_t	handle;
> +	uint32_t	offset;
> +	uint64_t	fence;
> +	uint32_t	ring;
> +	int32_t		priority;
> +	uint32_t	flags;
> +};
> +
>  #define RADEON_TILING_MACRO				0x1
>  #define RADEON_TILING_MICRO				0x2
>  #define RADEON_TILING_SWAP_16BIT			0x4
> -- 
> 1.9.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [RFC] drm/radeon: userfence IOCTL
  2015-04-13 16:55       ` Christian König
@ 2015-04-13 17:46         ` Jerome Glisse
  0 siblings, 0 replies; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 17:46 UTC (permalink / raw)
  To: Christian König; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 06:55:19PM +0200, Christian König wrote:
> On 13.04.2015 18:08, Jerome Glisse wrote:
> >On Mon, Apr 13, 2015 at 05:45:21PM +0200, Christian König wrote:
> >>On 13.04.2015 17:31, Jerome Glisse wrote:
> >>>On Mon, Apr 13, 2015 at 04:52:14PM +0200, Christian König wrote:
> >>>>Hello everyone,
> >>>>
> >>>>we have a requirement for a bit different kind of fence handling.
> >>>>Currently we handle fences completely inside the kernel, but in
> >>>>the future we would like to emit multiple fences inside the same
> >>>>IB as well.
> >>>>
> >>>>This works by adding multiple fence commands into an IB which
> >>>>just write their value to a specific location inside a BO and
> >>>>trigger the appropriate hardware interrupt.
> >>>>
> >>>>The user part of the driver stack should then be able to call an
> >>>>IOCTL to wait for the interrupt and block for the value (or
> >>>>something larger) to be written to the specific location.
> >>>>
> >>>>This has the advantage that you can have multiple synchronization
> >>>>points in the same IB and don't need to split up your draw commands
> >>>>over several IBs so that the kernel can insert kernel fences in
> >>>>between.
> >>>>
> >>>>The following set of patches tries to implement exactly this IOCTL.
> >>>>The big problem with that IOCTL is that TTM needs the BO to be
> >>>>kept in the same place while it is mapped inside the kernel page
> >>>>table. So this requires that we pin down the BO for the duration
> >>>>of the wait IOCTL.
> >>>>
> >>>>This practically gives userspace a way of pinning down BOs for as
> >>>>long as it wants, without the ability for the kernel for intervention.
> >>>>
> >>>>Any ideas how to avoid those problems? Or better ideas how to handle
> >>>>the new requirements?
> >>>So i think the simplest solution is to only allow such "fence" bo to
> >>>be inside system memory (no vram for them). My assumption here is that
> >>>such BO will barely see more than couple dword write so it is not a
> >>>bandwidth intensive BO. Or do you have a requirement for such BO to
> >>>be in VRAM ?
> >>Not that I know off.
> >>
> >>>Now to do that i would just add a property to buffer object that
> >>>effectively forbid such BO to be place anywhere else than GTT. Doing
> >>>that would make the ioctl code simpler, just check the BO as the
> >>>GTT only property set and if not return -EINVAL. Then its a simple
> >>>matter of kmapping the proper page.
> >>I've also considered adding an internal flag that when a buffer is kmapped
> >>we avoid moving it to VRAM / swapping it out, but see below.
> >>
> >>>Note that the only thing that would be left to forbid is the swaping
> >>>of the buffer due to memory pressure (from various ttm/core shrinker).
> >>Yeah, how the heck would I do that? That's internals of TTM that I never got
> >>into.
> >Actualy i think it is easier then i first thought, in the wait ioctl
> >check if the buffer has a pending fence ie gpu is still using it, if
> >not return -EAGAIN because it means that it is pointless to wait for
> >next GPU interrupt.
> >
> >For as long as the BO has an active fence it will not be swapped out
> >(see ttm_bo_swapout()). So in the wait event test both the value and
> >the pending fence. If the pending fence signal but not the value then
> >return -EAGAIN. In all case just keep a kmap of the page (do not kmap
> >the using existing kmap helper we would need something new to not
> >interfer with the shrinker). Not that after testing the value you would
> >need to check that the BO was not move and thus the page you were
> >looking into is still the one the BO is using.
> >
> >That way userspace can not abuse this ioctl to block the shrinker from
> >making progress.
> 
> So what we do on the start of the IOCTL is to check the BOs fences and see
> if it actually is still used and note it's current placement.
> 
> Then map it so the kernel can access it and in the waiting loop we check if
> it still has a fence and is still in the same place.
> 
> If there isn't any fences left or the placement has changed we simply assume
> that the fence is signaled.

Well no code is easier so pseudo below :

radeon_user_fence_wait_ioctl()
{
	struct radeon_bo rbo;
	struct page *page;
	uint64_t *fenceptr;

	radeon_bo_reserve(rbo, false);
	if (!(radeon_bo_property(rbo) & GTT_ONLY)) {
		radeon_bo_reserve(rbo, false);
		return -EINVAL;
	}

	if (reservation_object_test_signaled_rcu(robj->tbo.resv, true)) {
		radeon_bo_reserve(rbo, false);
		/* There is no pending work on this BO so just have
		 * userspace check for the fence itself. Proper user
		 * space will check fence has not signaled when getting
		 * -EAGAIN.
		 */
		return -EAGAIN;
	}

	page = ttm_get_page_at(&rbo->tbo, args->offset);
	// maybe we just want to return -EINVAL but if !page the
	// something is seriously wrong here.
	BUG_ON(!page);

	fenceptr = kmap(page) + (arg->offset & ~PAGE_MASK);
	radeon_bo_unreserve(robj);

	// Do not hold object

	radeon_irq_kms_sw_irq_get(rdev, ring);
	// FIXME timeout
	r = wait_event_interruptible(rdev->fence_queue,
				     (!robj->tbo.resv ||
				      *fenceptr >= args->fence ||
				       rdev->needs_reset));
	radeon_irq_kms_sw_irq_put(rdev, ring);

	radeon_bo_reserve(rbo, false);
	if (*fenceptr >= args->fence &&
	    page == ttm_get_page_at(&rbo->tbo, args->offset)) {
		radeon_bo_unreserve(robj);
		kunmap(page);
		/* success the user fence signaled. */
		return 0;
	}
	radeon_bo_unreserve(robj);
	kunmap(page);

	/* Fence is not signaled and either bo no longer have pending
	 * work or we timedout or gpu need reset.
	 */
	return -EAGAIN;
}


The point here is that it's ok to map the page in the first place has
object is still in use. But then even if object is swaped out it is
ok to keep that page mapped, worst case we are just accessing a page
use by someone else but we are just reading.

That is why we have to check again after the wait that we are still
reading from the right page to know fore sure if this is actually the
fence value we are looking at or just some random page.

That way because we do not have anything on the object while waiting
we are not blocking anything to happen. The object can be swapped out
and thus there is no way for userspace to abuse this ioctl.

The new GTT only property only ensure us that ttm_get_page_at() will
actually be successfull if the object is active. If the object is
swapped out then ttm_get_page_at() return NULL. But in the first part
of the ioctl as we checked that the object is still active we know
it can not be swapped out so ttm_get_page_at() must return a proper
page.

Cheers,
Jérôme

> 
> Yeah, that actually should work.
> 
> Thanks for the tip,
> Christian.
> 
> >
> >I need to look at ttm kmap code to see if it is actually useable without
> >disrupting the shrinker. Will do that after lunch.
> >
> >Cheers,
> >Jérôme
> >
> >>Thanks for the ideas,
> >>Christian.
> >>
> >>>Cheers,
> >>>Jérôme
> 
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/radeon: add userfence IOCTL
  2015-04-13 17:23   ` Daniel Vetter
@ 2015-04-13 17:51     ` Jerome Glisse
  2015-04-13 18:26       ` Christian König
  0 siblings, 1 reply; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 17:51 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 07:23:34PM +0200, Daniel Vetter wrote:
> On Mon, Apr 13, 2015 at 04:52:17PM +0200, Christian König wrote:
> > From: Christian König <christian.koenig@amd.com>
> > 
> > WIP patch which adds an user fence IOCTL.
> >
> > Signed-off-by: Christian König <christian.koenig@amd.com>
> 
> I've discussed userspace fences a lot with Jerome last XDC, so here's my
> comments:
> 
> My primary concern with mid-batch fences is that if we create real kernel
> fences (which might even escape to other places using android syncpts or
> dma-buf) then we end up relying upon correct userspace to not hang the
> kernel, which isn't good.

Yes i agree on that, solution i propose make sure that this can not happen.

> 
> So imo any kind of mid-batch fence must be done completely in userspace
> and never show up as a fence object on the kernel side. I thought that
> just busy-spinning in userspace would be all that's needed, but adding an
> ioctl to wait on such user fences seems like a nice idea too. On i915 we
> even have 2 interrupt sources per ring, so we could split the irq
> processing between kernel fences and userspace fences.

Technicaly here the kernel does not allocate any object it just that kernel
can enable GPU interrupt and thus wait "inteligently" until the GPU fire
an interrupt telling us that it might be a good time to look at the fence
value.

So technicaly this ioctl is nothing more than a wait for irq and check
memory value.

> 
> One thing to keep in mind (I dunno radeon/ttm internals enough to know) is
> to make sure that while being blocked for a userspace fence in the ioctl
> you're not starving anyone else. But it doesn't look like you're holding
> any reservation objects or something similar which might prevent
> concurrent cs.

Yes this is the discussion we are having, how to make sure that such ioctl
would not block any regular processing so that it could not be abuse in
anyway (well at least in anyway my devious imagination can think of right
now :)).

Cheers,
Jérôme
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/radeon: add userfence IOCTL
  2015-04-13 17:51     ` Jerome Glisse
@ 2015-04-13 18:26       ` Christian König
  2015-04-13 19:48         ` Jerome Glisse
  0 siblings, 1 reply; 23+ messages in thread
From: Christian König @ 2015-04-13 18:26 UTC (permalink / raw)
  To: Jerome Glisse, Daniel Vetter; +Cc: Serguei.Sagalovitch, dri-devel

On 13.04.2015 19:51, Jerome Glisse wrote:
> On Mon, Apr 13, 2015 at 07:23:34PM +0200, Daniel Vetter wrote:
>> On Mon, Apr 13, 2015 at 04:52:17PM +0200, Christian König wrote:
>>> From: Christian König <christian.koenig@amd.com>
>>>
>>> WIP patch which adds an user fence IOCTL.
>>>
>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>> I've discussed userspace fences a lot with Jerome last XDC, so here's my
>> comments:
>>
>> My primary concern with mid-batch fences is that if we create real kernel
>> fences (which might even escape to other places using android syncpts or
>> dma-buf) then we end up relying upon correct userspace to not hang the
>> kernel, which isn't good.
> Yes i agree on that, solution i propose make sure that this can not happen.

What if we want to base a GPU scheduler and Android sync points on that 
functionality? E.g. it might be necessary to create "struct fence" 
objects which are based on the information from userspace.

Would that be possible or would we run into issues?

Regards,
Christian.

>
>> So imo any kind of mid-batch fence must be done completely in userspace
>> and never show up as a fence object on the kernel side. I thought that
>> just busy-spinning in userspace would be all that's needed, but adding an
>> ioctl to wait on such user fences seems like a nice idea too. On i915 we
>> even have 2 interrupt sources per ring, so we could split the irq
>> processing between kernel fences and userspace fences.
> Technicaly here the kernel does not allocate any object it just that kernel
> can enable GPU interrupt and thus wait "inteligently" until the GPU fire
> an interrupt telling us that it might be a good time to look at the fence
> value.
>
> So technicaly this ioctl is nothing more than a wait for irq and check
> memory value.
>
>> One thing to keep in mind (I dunno radeon/ttm internals enough to know) is
>> to make sure that while being blocked for a userspace fence in the ioctl
>> you're not starving anyone else. But it doesn't look like you're holding
>> any reservation objects or something similar which might prevent
>> concurrent cs.
> Yes this is the discussion we are having, how to make sure that such ioctl
> would not block any regular processing so that it could not be abuse in
> anyway (well at least in anyway my devious imagination can think of right
> now :)).
>
> Cheers,
> Jérôme

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/radeon: add userfence IOCTL
  2015-04-13 18:26       ` Christian König
@ 2015-04-13 19:48         ` Jerome Glisse
  2015-04-14  8:17           ` Daniel Vetter
  0 siblings, 1 reply; 23+ messages in thread
From: Jerome Glisse @ 2015-04-13 19:48 UTC (permalink / raw)
  To: Christian König; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 08:26:05PM +0200, Christian König wrote:
> On 13.04.2015 19:51, Jerome Glisse wrote:
> >On Mon, Apr 13, 2015 at 07:23:34PM +0200, Daniel Vetter wrote:
> >>On Mon, Apr 13, 2015 at 04:52:17PM +0200, Christian König wrote:
> >>>From: Christian König <christian.koenig@amd.com>
> >>>
> >>>WIP patch which adds an user fence IOCTL.
> >>>
> >>>Signed-off-by: Christian König <christian.koenig@amd.com>
> >>I've discussed userspace fences a lot with Jerome last XDC, so here's my
> >>comments:
> >>
> >>My primary concern with mid-batch fences is that if we create real kernel
> >>fences (which might even escape to other places using android syncpts or
> >>dma-buf) then we end up relying upon correct userspace to not hang the
> >>kernel, which isn't good.
> >Yes i agree on that, solution i propose make sure that this can not happen.
> 
> What if we want to base a GPU scheduler and Android sync points on that
> functionality? E.g. it might be necessary to create "struct fence" objects
> which are based on the information from userspace.
> 
> Would that be possible or would we run into issues?

Well i would not like that, but i do not like Android code much, but you
could use the same code as i show in my other email and just have the
android fence struct take a reference on the BO where the fence is. Then
using same code to check status.

Obviously there should be timeout as there is no way for the kernel to
know if such fence will ever signal.

Cheers,
Jérôme
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 3/3] drm/radeon: add userfence IOCTL
  2015-04-13 19:48         ` Jerome Glisse
@ 2015-04-14  8:17           ` Daniel Vetter
  0 siblings, 0 replies; 23+ messages in thread
From: Daniel Vetter @ 2015-04-14  8:17 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Serguei.Sagalovitch, dri-devel

On Mon, Apr 13, 2015 at 03:48:51PM -0400, Jerome Glisse wrote:
> On Mon, Apr 13, 2015 at 08:26:05PM +0200, Christian König wrote:
> > On 13.04.2015 19:51, Jerome Glisse wrote:
> > >On Mon, Apr 13, 2015 at 07:23:34PM +0200, Daniel Vetter wrote:
> > >>On Mon, Apr 13, 2015 at 04:52:17PM +0200, Christian König wrote:
> > >>>From: Christian König <christian.koenig@amd.com>
> > >>>
> > >>>WIP patch which adds an user fence IOCTL.
> > >>>
> > >>>Signed-off-by: Christian König <christian.koenig@amd.com>
> > >>I've discussed userspace fences a lot with Jerome last XDC, so here's my
> > >>comments:
> > >>
> > >>My primary concern with mid-batch fences is that if we create real kernel
> > >>fences (which might even escape to other places using android syncpts or
> > >>dma-buf) then we end up relying upon correct userspace to not hang the
> > >>kernel, which isn't good.
> > >Yes i agree on that, solution i propose make sure that this can not happen.
> > 
> > What if we want to base a GPU scheduler and Android sync points on that
> > functionality? E.g. it might be necessary to create "struct fence" objects
> > which are based on the information from userspace.
> > 
> > Would that be possible or would we run into issues?
> 
> Well i would not like that, but i do not like Android code much, but you
> could use the same code as i show in my other email and just have the
> android fence struct take a reference on the BO where the fence is. Then
> using same code to check status.
> 
> Obviously there should be timeout as there is no way for the kernel to
> know if such fence will ever signal.

Yeah I think if your umd is using this for all fencing needs, including
cross-process and everything then I don't think it's such a good idea any
more ;-) But if it's just fine-grained sync for heterogenous compute
within the same process (ocl, hsa or whatever) then this seems reasonable.

I guess this boils down to what the userspace side will look like, and
what kind of standard you're trying to implement here.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-04-14  8:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-13 14:52 [RFC] drm/radeon: userfence IOCTL Christian König
2015-04-13 14:52 ` [PATCH 1/3] drm/radeon: wait for BO move to finish on kmap Christian König
2015-04-13 14:52 ` [PATCH 2/3] drm/radeon: cleanup radeon_cs_get_ring Christian König
2015-04-13 14:52 ` [PATCH 3/3] drm/radeon: add userfence IOCTL Christian König
2015-04-13 17:23   ` Daniel Vetter
2015-04-13 17:51     ` Jerome Glisse
2015-04-13 18:26       ` Christian König
2015-04-13 19:48         ` Jerome Glisse
2015-04-14  8:17           ` Daniel Vetter
2015-04-13 17:27   ` Daniel Vetter
2015-04-13 15:25 ` [RFC] drm/radeon: " Serguei Sagalovitch
2015-04-13 15:35   ` Christian König
2015-04-13 15:37     ` Serguei Sagalovitch
2015-04-13 15:46     ` Jerome Glisse
2015-04-13 15:39   ` Jerome Glisse
2015-04-13 15:47     ` Christian König
2015-04-13 15:52     ` Serguei Sagalovitch
2015-04-13 16:13       ` Jerome Glisse
2015-04-13 15:31 ` Jerome Glisse
2015-04-13 15:45   ` Christian König
2015-04-13 16:08     ` Jerome Glisse
2015-04-13 16:55       ` Christian König
2015-04-13 17:46         ` Jerome Glisse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.