All of lore.kernel.org
 help / color / mirror / Atom feed
* SA and other Patches.
@ 2012-05-07 11:42 Christian König
  2012-05-07 11:42 ` [PATCH 01/20] drm/radeon: fix possible lack of synchronization btw ttm and other ring Christian König
                   ` (20 more replies)
  0 siblings, 21 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel

Hi Jerome & everybody on the list,

this gathers together every patch we developed over the last week or so and
which is not already in drm-next.

I've run quite some tests with them yesterday and today and as far as I can
see hammered out every known bug. For the SA allocator I reverted to tracking
the hole pointer instead of just the last allocation, cause otherwise we will
never release the first allocation on the list. Glxgears now even keeps happily
running if I deadlock on the not GFX rings on purpose.

Please take a second look at them and if nobody objects any more we should
commit them to drm-next.

Cheers,
Christian.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 01/20] drm/radeon: fix possible lack of synchronization btw ttm and other ring
  2012-05-07 11:42 SA and other Patches Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 02/20] drm/radeon: clarify and extend wb setup on APUs and NI+ asics Christian König
                   ` (19 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse

From: Jerome Glisse <jglisse@redhat.com>

We need to sync with the GFX ring as ttm might have schedule bo move
on it and new command scheduled for other ring need to wait for bo
data to be in place.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Reviewed by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_cs.c |   12 ++++++------
 include/drm/radeon_drm.h           |    1 -
 2 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index c66beb1..289b0d7 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -122,15 +122,15 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 	int i, r;
 
 	for (i = 0; i < p->nrelocs; i++) {
+		struct radeon_fence *fence;
+
 		if (!p->relocs[i].robj || !p->relocs[i].robj->tbo.sync_obj)
 			continue;
 
-		if (!(p->relocs[i].flags & RADEON_RELOC_DONT_SYNC)) {
-			struct radeon_fence *fence = p->relocs[i].robj->tbo.sync_obj;
-			if (fence->ring != p->ring && !radeon_fence_signaled(fence)) {
-				sync_to_ring[fence->ring] = true;
-				need_sync = true;
-			}
+		fence = p->relocs[i].robj->tbo.sync_obj;
+		if (fence->ring != p->ring && !radeon_fence_signaled(fence)) {
+			sync_to_ring[fence->ring] = true;
+			need_sync = true;
 		}
 	}
 
diff --git a/include/drm/radeon_drm.h b/include/drm/radeon_drm.h
index 7c491b4..5805686 100644
--- a/include/drm/radeon_drm.h
+++ b/include/drm/radeon_drm.h
@@ -926,7 +926,6 @@ struct drm_radeon_cs_chunk {
 };
 
 /* drm_radeon_cs_reloc.flags */
-#define RADEON_RELOC_DONT_SYNC		0x01
 
 struct drm_radeon_cs_reloc {
 	uint32_t		handle;
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 02/20] drm/radeon: clarify and extend wb setup on APUs and NI+ asics
  2012-05-07 11:42 SA and other Patches Christian König
  2012-05-07 11:42 ` [PATCH 01/20] drm/radeon: fix possible lack of synchronization btw ttm and other ring Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 03/20] drm/radeon: replace the per ring mutex with a global one Christian König
                   ` (18 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Alex Deucher

From: Alex Deucher <alexander.deucher@amd.com>

Use family rather than DCE check for clarity, also always use
wb on APUs, there will never be AGP variants.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed by: Christian König <christian.koenig@amd.com>
---
 drivers/gpu/drm/radeon/radeon_device.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index d18f0c4..ff28210 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -241,8 +241,8 @@ int radeon_wb_init(struct radeon_device *rdev)
 				rdev->wb.use_event = true;
 		}
 	}
-	/* always use writeback/events on NI */
-	if (ASIC_IS_DCE5(rdev)) {
+	/* always use writeback/events on NI, APUs */
+	if (rdev->family >= CHIP_PALM) {
 		rdev->wb.enabled = true;
 		rdev->wb.use_event = true;
 	}
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 03/20] drm/radeon: replace the per ring mutex with a global one
  2012-05-07 11:42 SA and other Patches Christian König
  2012-05-07 11:42 ` [PATCH 01/20] drm/radeon: fix possible lack of synchronization btw ttm and other ring Christian König
  2012-05-07 11:42 ` [PATCH 02/20] drm/radeon: clarify and extend wb setup on APUs and NI+ asics Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 04/20] drm/radeon: convert fence to uint64_t v4 Christian König
                   ` (17 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

A single global mutex for ring submissions seems sufficient.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h           |    3 +-
 drivers/gpu/drm/radeon/radeon_device.c    |    3 +-
 drivers/gpu/drm/radeon/radeon_pm.c        |   10 +-----
 drivers/gpu/drm/radeon/radeon_ring.c      |   28 +++++++++++-------
 drivers/gpu/drm/radeon/radeon_semaphore.c |   42 +++++++++++++----------------
 5 files changed, 41 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 82ffa6a..e99ea81 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -676,7 +676,6 @@ struct radeon_ring {
 	uint64_t		gpu_addr;
 	uint32_t		align_mask;
 	uint32_t		ptr_mask;
-	struct mutex		mutex;
 	bool			ready;
 	u32			ptr_reg_shift;
 	u32			ptr_reg_mask;
@@ -815,6 +814,7 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *cp, unsign
 int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *cp, unsigned ndw);
 void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *cp);
 void radeon_ring_unlock_commit(struct radeon_device *rdev, struct radeon_ring *cp);
+void radeon_ring_undo(struct radeon_ring *ring);
 void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring *cp);
 int radeon_ring_test(struct radeon_device *rdev, struct radeon_ring *cp);
 void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *ring);
@@ -1534,6 +1534,7 @@ struct radeon_device {
 	rwlock_t			fence_lock;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
 	struct radeon_semaphore_driver	semaphore_drv;
+	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	struct radeon_ib_pool		ib_pool;
 	struct radeon_irq		irq;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index ff28210..3f6ff2a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -724,8 +724,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	 * can recall function without having locking issues */
 	radeon_mutex_init(&rdev->cs_mutex);
 	radeon_mutex_init(&rdev->ib_pool.mutex);
-	for (i = 0; i < RADEON_NUM_RINGS; ++i)
-		mutex_init(&rdev->ring[i].mutex);
+	mutex_init(&rdev->ring_lock);
 	mutex_init(&rdev->dc_hw_i2c_mutex);
 	if (rdev->family >= CHIP_R600)
 		spin_lock_init(&rdev->ih.lock);
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
index caa55d6..7c38745 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -252,10 +252,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev)
 
 	mutex_lock(&rdev->ddev->struct_mutex);
 	mutex_lock(&rdev->vram_mutex);
-	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-		if (rdev->ring[i].ring_obj)
-			mutex_lock(&rdev->ring[i].mutex);
-	}
+	mutex_lock(&rdev->ring_lock);
 
 	/* gui idle int has issues on older chips it seems */
 	if (rdev->family >= CHIP_R600) {
@@ -311,10 +308,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev)
 
 	rdev->pm.dynpm_planned_action = DYNPM_ACTION_NONE;
 
-	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-		if (rdev->ring[i].ring_obj)
-			mutex_unlock(&rdev->ring[i].mutex);
-	}
+	mutex_unlock(&rdev->ring_lock);
 	mutex_unlock(&rdev->vram_mutex);
 	mutex_unlock(&rdev->ddev->struct_mutex);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 2eb4c6e..a4d60ae 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -346,9 +346,9 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi
 		if (ndw < ring->ring_free_dw) {
 			break;
 		}
-		mutex_unlock(&ring->mutex);
+		mutex_unlock(&rdev->ring_lock);
 		r = radeon_fence_wait_next(rdev, radeon_ring_index(rdev, ring));
-		mutex_lock(&ring->mutex);
+		mutex_lock(&rdev->ring_lock);
 		if (r)
 			return r;
 	}
@@ -361,10 +361,10 @@ int radeon_ring_lock(struct radeon_device *rdev, struct radeon_ring *ring, unsig
 {
 	int r;
 
-	mutex_lock(&ring->mutex);
+	mutex_lock(&rdev->ring_lock);
 	r = radeon_ring_alloc(rdev, ring, ndw);
 	if (r) {
-		mutex_unlock(&ring->mutex);
+		mutex_unlock(&rdev->ring_lock);
 		return r;
 	}
 	return 0;
@@ -389,20 +389,25 @@ void radeon_ring_commit(struct radeon_device *rdev, struct radeon_ring *ring)
 void radeon_ring_unlock_commit(struct radeon_device *rdev, struct radeon_ring *ring)
 {
 	radeon_ring_commit(rdev, ring);
-	mutex_unlock(&ring->mutex);
+	mutex_unlock(&rdev->ring_lock);
 }
 
-void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring *ring)
+void radeon_ring_undo(struct radeon_ring *ring)
 {
 	ring->wptr = ring->wptr_old;
-	mutex_unlock(&ring->mutex);
+}
+
+void radeon_ring_unlock_undo(struct radeon_device *rdev, struct radeon_ring *ring)
+{
+	radeon_ring_undo(ring);
+	mutex_unlock(&rdev->ring_lock);
 }
 
 void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *ring)
 {
 	int r;
 
-	mutex_lock(&ring->mutex);
+	mutex_lock(&rdev->ring_lock);
 	radeon_ring_free_size(rdev, ring);
 	if (ring->rptr == ring->wptr) {
 		r = radeon_ring_alloc(rdev, ring, 1);
@@ -411,7 +416,7 @@ void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *
 			radeon_ring_commit(rdev, ring);
 		}
 	}
-	mutex_unlock(&ring->mutex);
+	mutex_unlock(&rdev->ring_lock);
 }
 
 void radeon_ring_lockup_update(struct radeon_ring *ring)
@@ -520,11 +525,12 @@ void radeon_ring_fini(struct radeon_device *rdev, struct radeon_ring *ring)
 	int r;
 	struct radeon_bo *ring_obj;
 
-	mutex_lock(&ring->mutex);
+	mutex_lock(&rdev->ring_lock);
 	ring_obj = ring->ring_obj;
+	ring->ready = false;
 	ring->ring = NULL;
 	ring->ring_obj = NULL;
-	mutex_unlock(&ring->mutex);
+	mutex_unlock(&rdev->ring_lock);
 
 	if (ring_obj) {
 		r = radeon_bo_reserve(ring_obj, false);
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c
index 930a08a..c5b3d8e 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -39,7 +39,6 @@ static int radeon_semaphore_add_bo(struct radeon_device *rdev)
 	uint32_t *cpu_ptr;
 	int r, i;
 
-
 	bo = kmalloc(sizeof(struct radeon_semaphore_bo), GFP_KERNEL);
 	if (bo == NULL) {
 		return -ENOMEM;
@@ -154,13 +153,17 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev,
 				bool sync_to[RADEON_NUM_RINGS],
 				int dst_ring)
 {
-	int i, r;
+	int i = 0, r;
 
-	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-		unsigned num_ops = i == dst_ring ? RADEON_NUM_RINGS : 1;
+	mutex_lock(&rdev->ring_lock);
+	r = radeon_ring_alloc(rdev, &rdev->ring[dst_ring], RADEON_NUM_RINGS * 8);
+	if (r) {
+		goto error;
+	}
 
-		/* don't lock unused rings */
-		if (!sync_to[i] && i != dst_ring)
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		/* no need to sync to our own or unused rings */
+		if (!sync_to[i] || i == dst_ring)
 			continue;
 
 		/* prevent GPU deadlocks */
@@ -170,38 +173,31 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev,
 			goto error;
 		}
 
-                r = radeon_ring_lock(rdev, &rdev->ring[i], num_ops * 8);
-                if (r)
+		r = radeon_ring_alloc(rdev, &rdev->ring[i], 8);
+		if (r) {
 			goto error;
-	}
-
-	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-		/* no need to sync to our own or unused rings */
-		if (!sync_to[i] || i == dst_ring)
-                        continue;
+		}
 
 		radeon_semaphore_emit_signal(rdev, i, semaphore);
 		radeon_semaphore_emit_wait(rdev, dst_ring, semaphore);
-	}
 
-	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-
-		/* don't unlock unused rings */
-		if (!sync_to[i] && i != dst_ring)
-			continue;
-
-		radeon_ring_unlock_commit(rdev, &rdev->ring[i]);
+		radeon_ring_commit(rdev, &rdev->ring[i]);
 	}
 
+	radeon_ring_commit(rdev, &rdev->ring[dst_ring]);
+	mutex_unlock(&rdev->ring_lock);
+
 	return 0;
 
 error:
 	/* unlock all locks taken so far */
 	for (--i; i >= 0; --i) {
 		if (sync_to[i] || i == dst_ring) {
-			radeon_ring_unlock_undo(rdev, &rdev->ring[i]);
+			radeon_ring_undo(&rdev->ring[i]);
 		}
 	}
+	radeon_ring_undo(&rdev->ring[dst_ring]);
+	mutex_unlock(&rdev->ring_lock);
 	return r;
 }
 
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 04/20] drm/radeon: convert fence to uint64_t v4
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (2 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 03/20] drm/radeon: replace the per ring mutex with a global one Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 14:39   ` Jerome Glisse
  2012-05-07 11:42 ` [PATCH 05/20] drm/radeon: rework fence handling, drop fence list v5 Christian König
                   ` (16 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse, Christian König

From: Jerome Glisse <jglisse@redhat.com>

This convert fence to use uint64_t sequence number intention is
to use the fact that uin64_t is big enough that we don't need to
care about wrap around.

Tested with and without writeback using 0xFFFFF000 as initial
fence sequence and thus allowing to test the wrap around from
32bits to 64bits.

v2: Add comment about possible race btw CPU & GPU, add comment
    stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET
    Read fence sequenc in reverse order of GPU write them so we
    mitigate the race btw CPU and GPU.

v3: Drop the need for ring to emit the 64bits fence, and just have
    each ring emit the lower 32bits of the fence sequence. We
    handle the wrap over 32bits in fence_process.

v4: Just a small optimization: Don't reread the last_seq value
    if loop restarts, since we already know its value anyway.
    Also start at zero not one for seq value and use pre instead
    of post increment in emmit, otherwise wait_empty will deadlock.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h       |   39 ++++++-----
 drivers/gpu/drm/radeon/radeon_fence.c |  116 +++++++++++++++++++++++----------
 drivers/gpu/drm/radeon/radeon_ring.c  |    9 ++-
 3 files changed, 107 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index e99ea81..cdf46bc 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -100,28 +100,32 @@ extern int radeon_lockup_timeout;
  * Copy from radeon_drv.h so we don't have to include both and have conflicting
  * symbol;
  */
-#define RADEON_MAX_USEC_TIMEOUT		100000	/* 100 ms */
-#define RADEON_FENCE_JIFFIES_TIMEOUT	(HZ / 2)
+#define RADEON_MAX_USEC_TIMEOUT			100000	/* 100 ms */
+#define RADEON_FENCE_JIFFIES_TIMEOUT		(HZ / 2)
 /* RADEON_IB_POOL_SIZE must be a power of 2 */
-#define RADEON_IB_POOL_SIZE		16
-#define RADEON_DEBUGFS_MAX_COMPONENTS	32
-#define RADEONFB_CONN_LIMIT		4
-#define RADEON_BIOS_NUM_SCRATCH		8
+#define RADEON_IB_POOL_SIZE			16
+#define RADEON_DEBUGFS_MAX_COMPONENTS		32
+#define RADEONFB_CONN_LIMIT			4
+#define RADEON_BIOS_NUM_SCRATCH			8
 
 /* max number of rings */
-#define RADEON_NUM_RINGS 3
+#define RADEON_NUM_RINGS			3
+
+/* fence seq are set to this number when signaled */
+#define RADEON_FENCE_SIGNALED_SEQ		0LL
+#define RADEON_FENCE_NOTEMITED_SEQ		(~0LL)
 
 /* internal ring indices */
 /* r1xx+ has gfx CP ring */
-#define RADEON_RING_TYPE_GFX_INDEX  0
+#define RADEON_RING_TYPE_GFX_INDEX		0
 
 /* cayman has 2 compute CP rings */
-#define CAYMAN_RING_TYPE_CP1_INDEX 1
-#define CAYMAN_RING_TYPE_CP2_INDEX 2
+#define CAYMAN_RING_TYPE_CP1_INDEX		1
+#define CAYMAN_RING_TYPE_CP2_INDEX		2
 
 /* hardcode those limit for now */
-#define RADEON_VA_RESERVED_SIZE		(8 << 20)
-#define RADEON_IB_VM_MAX_SIZE		(64 << 10)
+#define RADEON_VA_RESERVED_SIZE			(8 << 20)
+#define RADEON_IB_VM_MAX_SIZE			(64 << 10)
 
 /*
  * Errata workarounds.
@@ -254,8 +258,9 @@ struct radeon_fence_driver {
 	uint32_t			scratch_reg;
 	uint64_t			gpu_addr;
 	volatile uint32_t		*cpu_addr;
-	atomic_t			seq;
-	uint32_t			last_seq;
+	/* seq is protected by ring emission lock */
+	uint64_t			seq;
+	atomic64_t			last_seq;
 	unsigned long			last_activity;
 	wait_queue_head_t		queue;
 	struct list_head		emitted;
@@ -268,11 +273,9 @@ struct radeon_fence {
 	struct kref			kref;
 	struct list_head		list;
 	/* protected by radeon_fence.lock */
-	uint32_t			seq;
-	bool				emitted;
-	bool				signaled;
+	uint64_t			seq;
 	/* RB, DMA, etc. */
-	int				ring;
+	unsigned			ring;
 	struct radeon_semaphore		*semaphore;
 };
 
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 5bb78bf..feb2bbc 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence)
 	unsigned long irq_flags;
 
 	write_lock_irqsave(&rdev->fence_lock, irq_flags);
-	if (fence->emitted) {
+	if (fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
 		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 		return 0;
 	}
-	fence->seq = atomic_add_return(1, &rdev->fence_drv[fence->ring].seq);
+	/* we are protected by the ring emission mutex */
+	fence->seq = ++rdev->fence_drv[fence->ring].seq;
 	radeon_fence_ring_emit(rdev, fence->ring, fence);
 	trace_radeon_fence_emit(rdev->ddev, fence->seq);
-	fence->emitted = true;
 	/* are we the first fence on a previusly idle ring? */
 	if (list_empty(&rdev->fence_drv[fence->ring].emitted)) {
 		rdev->fence_drv[fence->ring].last_activity = jiffies;
@@ -87,14 +87,60 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
 {
 	struct radeon_fence *fence;
 	struct list_head *i, *n;
-	uint32_t seq;
+	uint64_t seq, last_seq;
+	unsigned count_loop = 0;
 	bool wake = false;
 
-	seq = radeon_fence_read(rdev, ring);
-	if (seq == rdev->fence_drv[ring].last_seq)
-		return false;
+	/* Note there is a scenario here for an infinite loop but it's
+	 * very unlikely to happen. For it to happen, the current polling
+	 * process need to be interrupted by another process and another
+	 * process needs to update the last_seq btw the atomic read and
+	 * xchg of the current process.
+	 *
+	 * More over for this to go in infinite loop there need to be
+	 * continuously new fence signaled ie radeon_fence_read needs
+	 * to return a different value each time for both the currently
+	 * polling process and the other process that xchg the last_seq
+	 * btw atomic read and xchg of the current process. And the
+	 * value the other process set as last seq must be higher than
+	 * the seq value we just read. Which means that current process
+	 * need to be interrupted after radeon_fence_read and before
+	 * atomic xchg.
+	 *
+	 * To be even more safe we count the number of time we loop and
+	 * we bail after 10 loop just accepting the fact that we might
+	 * have temporarly set the last_seq not to the true real last
+	 * seq but to an older one.
+	 */
+	last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
+	do {
+		seq = radeon_fence_read(rdev, ring);
+		seq |= last_seq & 0xffffffff00000000LL;
+		if (seq < last_seq) {
+			seq += 0x100000000LL;
+		}
 
-	rdev->fence_drv[ring].last_seq = seq;
+		if (!wake && seq == last_seq) {
+			return false;
+		}
+		/* If we loop over we don't want to return without
+		 * checking if a fence is signaled as it means that the
+		 * seq we just read is different from the previous on.
+		 */
+		wake = true;
+		if ((count_loop++) > 10) {
+			/* We looped over too many time leave with the
+			 * fact that we might have set an older fence
+			 * seq then the current real last seq as signaled
+			 * by the hw.
+			 */
+			break;
+		}
+		last_seq = seq;
+	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
+
+	/* reset wake to false */
+	wake = false;
 	rdev->fence_drv[ring].last_activity = jiffies;
 
 	n = NULL;
@@ -112,7 +158,7 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
 			n = i->prev;
 			list_move_tail(i, &rdev->fence_drv[ring].signaled);
 			fence = list_entry(i, struct radeon_fence, list);
-			fence->signaled = true;
+			fence->seq = RADEON_FENCE_SIGNALED_SEQ;
 			i = n;
 		} while (i != &rdev->fence_drv[ring].emitted);
 		wake = true;
@@ -128,7 +174,7 @@ static void radeon_fence_destroy(struct kref *kref)
 	fence = container_of(kref, struct radeon_fence, kref);
 	write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
 	list_del(&fence->list);
-	fence->emitted = false;
+	fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
 	write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
 	if (fence->semaphore)
 		radeon_semaphore_free(fence->rdev, fence->semaphore);
@@ -145,9 +191,7 @@ int radeon_fence_create(struct radeon_device *rdev,
 	}
 	kref_init(&((*fence)->kref));
 	(*fence)->rdev = rdev;
-	(*fence)->emitted = false;
-	(*fence)->signaled = false;
-	(*fence)->seq = 0;
+	(*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ;
 	(*fence)->ring = ring;
 	(*fence)->semaphore = NULL;
 	INIT_LIST_HEAD(&(*fence)->list);
@@ -163,18 +207,18 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 		return true;
 
 	write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
-	signaled = fence->signaled;
+	signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
 	/* if we are shuting down report all fence as signaled */
 	if (fence->rdev->shutdown) {
 		signaled = true;
 	}
-	if (!fence->emitted) {
+	if (fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
 		WARN(1, "Querying an unemitted fence : %p !\n", fence);
 		signaled = true;
 	}
 	if (!signaled) {
 		radeon_fence_poll_locked(fence->rdev, fence->ring);
-		signaled = fence->signaled;
+		signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
 	}
 	write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
 	return signaled;
@@ -183,8 +227,8 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
 	struct radeon_device *rdev;
-	unsigned long irq_flags, timeout;
-	u32 seq;
+	unsigned long irq_flags, timeout, last_activity;
+	uint64_t seq;
 	int i, r;
 	bool signaled;
 
@@ -207,7 +251,9 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 			timeout = 1;
 		}
 		/* save current sequence value used to check for GPU lockups */
-		seq = rdev->fence_drv[fence->ring].last_seq;
+		seq = atomic64_read(&rdev->fence_drv[fence->ring].last_seq);
+		/* Save current last activity valuee, used to check for GPU lockups */
+		last_activity = rdev->fence_drv[fence->ring].last_activity;
 		read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 
 		trace_radeon_fence_wait_begin(rdev->ddev, seq);
@@ -235,24 +281,23 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 			}
 
 			write_lock_irqsave(&rdev->fence_lock, irq_flags);
-			/* check if sequence value has changed since last_activity */
-			if (seq != rdev->fence_drv[fence->ring].last_seq) {
+			/* test if somebody else has already decided that this is a lockup */
+			if (last_activity != rdev->fence_drv[fence->ring].last_activity) {
 				write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 				continue;
 			}
 
-			/* change sequence value on all rings, so nobody else things there is a lockup */
-			for (i = 0; i < RADEON_NUM_RINGS; ++i)
-				rdev->fence_drv[i].last_seq -= 0x10000;
-
-			rdev->fence_drv[fence->ring].last_activity = jiffies;
 			write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 
 			if (radeon_ring_is_lockup(rdev, fence->ring, &rdev->ring[fence->ring])) {
-
 				/* good news we believe it's a lockup */
-				printk(KERN_WARNING "GPU lockup (waiting for 0x%08X last fence id 0x%08X)\n",
-				     fence->seq, seq);
+				dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx last fence id 0x%016llx)\n",
+					 fence->seq, seq);
+
+				/* change last activity so nobody else think there is a lockup */
+				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+					rdev->fence_drv[i].last_activity = jiffies;
+				}
 
 				/* mark the ring as not ready any more */
 				rdev->ring[fence->ring].ready = false;
@@ -387,9 +432,9 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
 	}
 	rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4];
 	rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
-	radeon_fence_write(rdev, atomic_read(&rdev->fence_drv[ring].seq), ring);
+	radeon_fence_write(rdev, rdev->fence_drv[ring].seq, ring);
 	rdev->fence_drv[ring].initialized = true;
-	DRM_INFO("fence driver on ring %d use gpu addr 0x%08Lx and cpu addr 0x%p\n",
+	DRM_INFO("fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n",
 		 ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);
 	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 	return 0;
@@ -400,7 +445,8 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	rdev->fence_drv[ring].scratch_reg = -1;
 	rdev->fence_drv[ring].cpu_addr = NULL;
 	rdev->fence_drv[ring].gpu_addr = 0;
-	atomic_set(&rdev->fence_drv[ring].seq, 0);
+	rdev->fence_drv[ring].seq = 0;
+	atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
 	INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
 	INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
 	init_waitqueue_head(&rdev->fence_drv[ring].queue);
@@ -458,12 +504,12 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data)
 			continue;
 
 		seq_printf(m, "--- ring %d ---\n", i);
-		seq_printf(m, "Last signaled fence 0x%08X\n",
-			   radeon_fence_read(rdev, i));
+		seq_printf(m, "Last signaled fence 0x%016lx\n",
+			   atomic64_read(&rdev->fence_drv[i].last_seq));
 		if (!list_empty(&rdev->fence_drv[i].emitted)) {
 			fence = list_entry(rdev->fence_drv[i].emitted.prev,
 					   struct radeon_fence, list);
-			seq_printf(m, "Last emitted fence %p with 0x%08X\n",
+			seq_printf(m, "Last emitted fence %p with 0x%016llx\n",
 				   fence,  fence->seq);
 		}
 	}
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index a4d60ae..4ae222b 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -82,7 +82,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib)
 	bool done = false;
 
 	/* only free ib which have been emited */
-	if (ib->fence && ib->fence->emitted) {
+	if (ib->fence && ib->fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
 		if (radeon_fence_signaled(ib->fence)) {
 			radeon_fence_unref(&ib->fence);
 			radeon_sa_bo_free(rdev, &ib->sa_bo);
@@ -149,8 +149,9 @@ retry:
 	/* this should be rare event, ie all ib scheduled none signaled yet.
 	 */
 	for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
-		if (rdev->ib_pool.ibs[idx].fence && rdev->ib_pool.ibs[idx].fence->emitted) {
-			r = radeon_fence_wait(rdev->ib_pool.ibs[idx].fence, false);
+		struct radeon_fence *fence = rdev->ib_pool.ibs[idx].fence;
+		if (fence && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
+			r = radeon_fence_wait(fence, false);
 			if (!r) {
 				goto retry;
 			}
@@ -173,7 +174,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
 		return;
 	}
 	radeon_mutex_lock(&rdev->ib_pool.mutex);
-	if (tmp->fence && !tmp->fence->emitted) {
+	if (tmp->fence && tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
 		radeon_sa_bo_free(rdev, &tmp->sa_bo);
 		radeon_fence_unref(&tmp->fence);
 	}
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 05/20] drm/radeon: rework fence handling, drop fence list v5
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (3 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 04/20] drm/radeon: convert fence to uint64_t v4 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 06/20] drm/radeon: rework locking ring emission mutex in fence deadlock detection Christian König
                   ` (15 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse, Christian König

From: Jerome Glisse <jglisse@redhat.com>

Using 64bits fence sequence we can directly compare sequence
number to know if a fence is signaled or not. Thus the fence
list became useless, so does the fence lock that mainly
protected the fence list.

Things like ring.ready are no longer behind a lock, this should
be ok as ring.ready is initialized once and will only change
when facing lockup. Worst case is that we return an -EBUSY just
after a successfull GPU reset, or we go into wait state instead
of returning -EBUSY (thus delaying reporting -EBUSY to fence
wait caller).

v2: Remove left over comment, force using writeback on cayman and
    newer, thus not having to suffer from possibly scratch reg
    exhaustion
v3: Rebase on top of change to uint64 fence patch
v4: Change DCE5 test to force write back on cayman and newer but
    also any APU such as PALM or SUMO family
v5: Rebase on top of new uint64 fence patch
v6: Just break if seq doesn't change any more. Use radeon_fence
    prefix for all function names. Even if it's now highly optimized,
    try avoiding polling to often.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h        |    6 +-
 drivers/gpu/drm/radeon/radeon_device.c |    8 +-
 drivers/gpu/drm/radeon/radeon_fence.c  |  289 +++++++++++++-------------------
 3 files changed, 118 insertions(+), 185 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index cdf46bc..7c87117 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -263,15 +263,12 @@ struct radeon_fence_driver {
 	atomic64_t			last_seq;
 	unsigned long			last_activity;
 	wait_queue_head_t		queue;
-	struct list_head		emitted;
-	struct list_head		signaled;
 	bool				initialized;
 };
 
 struct radeon_fence {
 	struct radeon_device		*rdev;
 	struct kref			kref;
-	struct list_head		list;
 	/* protected by radeon_fence.lock */
 	uint64_t			seq;
 	/* RB, DMA, etc. */
@@ -291,7 +288,7 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring);
 int radeon_fence_wait_empty(struct radeon_device *rdev, int ring);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
-int radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
+unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
 
 /*
  * Tiling registers
@@ -1534,7 +1531,6 @@ struct radeon_device {
 	struct radeon_mode_info		mode_info;
 	struct radeon_scratch		scratch;
 	struct radeon_mman		mman;
-	rwlock_t			fence_lock;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
 	struct radeon_semaphore_driver	semaphore_drv;
 	struct mutex			ring_lock;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 3f6ff2a..0e7b72a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -225,9 +225,9 @@ int radeon_wb_init(struct radeon_device *rdev)
 	/* disable event_write fences */
 	rdev->wb.use_event = false;
 	/* disabled via module param */
-	if (radeon_no_wb == 1)
+	if (radeon_no_wb == 1) {
 		rdev->wb.enabled = false;
-	else {
+	} else {
 		if (rdev->flags & RADEON_IS_AGP) {
 			/* often unreliable on AGP */
 			rdev->wb.enabled = false;
@@ -237,8 +237,9 @@ int radeon_wb_init(struct radeon_device *rdev)
 		} else {
 			rdev->wb.enabled = true;
 			/* event_write fences are only available on r600+ */
-			if (rdev->family >= CHIP_R600)
+			if (rdev->family >= CHIP_R600) {
 				rdev->wb.use_event = true;
+			}
 		}
 	}
 	/* always use writeback/events on NI, APUs */
@@ -731,7 +732,6 @@ int radeon_device_init(struct radeon_device *rdev,
 	mutex_init(&rdev->gem.mutex);
 	mutex_init(&rdev->pm.mutex);
 	mutex_init(&rdev->vram_mutex);
-	rwlock_init(&rdev->fence_lock);
 	rwlock_init(&rdev->semaphore_drv.lock);
 	INIT_LIST_HEAD(&rdev->gem.objects);
 	init_waitqueue_head(&rdev->irq.vblank_queue);
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index feb2bbc..f386807 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -63,30 +63,18 @@ static u32 radeon_fence_read(struct radeon_device *rdev, int ring)
 
 int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence)
 {
-	unsigned long irq_flags;
-
-	write_lock_irqsave(&rdev->fence_lock, irq_flags);
+	/* we are protected by the ring emission mutex */
 	if (fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
-		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 		return 0;
 	}
-	/* we are protected by the ring emission mutex */
 	fence->seq = ++rdev->fence_drv[fence->ring].seq;
 	radeon_fence_ring_emit(rdev, fence->ring, fence);
 	trace_radeon_fence_emit(rdev->ddev, fence->seq);
-	/* are we the first fence on a previusly idle ring? */
-	if (list_empty(&rdev->fence_drv[fence->ring].emitted)) {
-		rdev->fence_drv[fence->ring].last_activity = jiffies;
-	}
-	list_move_tail(&fence->list, &rdev->fence_drv[fence->ring].emitted);
-	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 	return 0;
 }
 
-static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
+static bool radeon_fence_poll(struct radeon_device *rdev, unsigned ring)
 {
-	struct radeon_fence *fence;
-	struct list_head *i, *n;
 	uint64_t seq, last_seq;
 	unsigned count_loop = 0;
 	bool wake = false;
@@ -120,14 +108,16 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
 			seq += 0x100000000LL;
 		}
 
-		if (!wake && seq == last_seq) {
-			return false;
+		if (seq == last_seq) {
+			break;
 		}
 		/* If we loop over we don't want to return without
 		 * checking if a fence is signaled as it means that the
 		 * seq we just read is different from the previous on.
 		 */
 		wake = true;
+		last_seq = seq;
+		rdev->fence_drv[ring].last_activity = jiffies;
 		if ((count_loop++) > 10) {
 			/* We looped over too many time leave with the
 			 * fact that we might have set an older fence
@@ -136,46 +126,17 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
 			 */
 			break;
 		}
-		last_seq = seq;
 	} while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
 
-	/* reset wake to false */
-	wake = false;
-	rdev->fence_drv[ring].last_activity = jiffies;
-
-	n = NULL;
-	list_for_each(i, &rdev->fence_drv[ring].emitted) {
-		fence = list_entry(i, struct radeon_fence, list);
-		if (fence->seq == seq) {
-			n = i;
-			break;
-		}
-	}
-	/* all fence previous to this one are considered as signaled */
-	if (n) {
-		i = n;
-		do {
-			n = i->prev;
-			list_move_tail(i, &rdev->fence_drv[ring].signaled);
-			fence = list_entry(i, struct radeon_fence, list);
-			fence->seq = RADEON_FENCE_SIGNALED_SEQ;
-			i = n;
-		} while (i != &rdev->fence_drv[ring].emitted);
-		wake = true;
-	}
 	return wake;
 }
 
 static void radeon_fence_destroy(struct kref *kref)
 {
-	unsigned long irq_flags;
-        struct radeon_fence *fence;
+	struct radeon_fence *fence;
 
 	fence = container_of(kref, struct radeon_fence, kref);
-	write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
-	list_del(&fence->list);
 	fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
-	write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
 	if (fence->semaphore)
 		radeon_semaphore_free(fence->rdev, fence->semaphore);
 	kfree(fence);
@@ -194,80 +155,82 @@ int radeon_fence_create(struct radeon_device *rdev,
 	(*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ;
 	(*fence)->ring = ring;
 	(*fence)->semaphore = NULL;
-	INIT_LIST_HEAD(&(*fence)->list);
 	return 0;
 }
 
-bool radeon_fence_signaled(struct radeon_fence *fence)
+static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
+				      u64 seq, unsigned ring)
 {
-	unsigned long irq_flags;
-	bool signaled = false;
-
-	if (!fence)
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
 		return true;
+	}
+	/* poll new last sequence at least once */
+	radeon_fence_poll(rdev, ring);
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+		return true;
+	}
+	return false;
+}
 
-	write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
-	signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
-	/* if we are shuting down report all fence as signaled */
-	if (fence->rdev->shutdown) {
-		signaled = true;
+bool radeon_fence_signaled(struct radeon_fence *fence)
+{
+	if (!fence) {
+		return true;
 	}
 	if (fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
 		WARN(1, "Querying an unemitted fence : %p !\n", fence);
-		signaled = true;
+		return true;
+	}
+	if (fence->seq == RADEON_FENCE_SIGNALED_SEQ) {
+		return true;
 	}
-	if (!signaled) {
-		radeon_fence_poll_locked(fence->rdev, fence->ring);
-		signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
+	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
+		fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+		return true;
 	}
-	write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
-	return signaled;
+	return false;
 }
 
-int radeon_fence_wait(struct radeon_fence *fence, bool intr)
+static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
+				 unsigned ring, bool intr)
 {
-	struct radeon_device *rdev;
-	unsigned long irq_flags, timeout, last_activity;
+	unsigned long timeout, last_activity;
 	uint64_t seq;
-	int i, r;
+	unsigned i;
 	bool signaled;
+	int r;
 
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
+	while (target_seq > atomic64_read(&rdev->fence_drv[ring].last_seq)) {
+		if (!rdev->ring[ring].ready) {
+			return -EBUSY;
+		}
 
-	rdev = fence->rdev;
-	signaled = radeon_fence_signaled(fence);
-	while (!signaled) {
-		read_lock_irqsave(&rdev->fence_lock, irq_flags);
 		timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT;
-		if (time_after(rdev->fence_drv[fence->ring].last_activity, timeout)) {
+		if (time_after(rdev->fence_drv[ring].last_activity, timeout)) {
 			/* the normal case, timeout is somewhere before last_activity */
-			timeout = rdev->fence_drv[fence->ring].last_activity - timeout;
+			timeout = rdev->fence_drv[ring].last_activity - timeout;
 		} else {
 			/* either jiffies wrapped around, or no fence was signaled in the last 500ms
-			 * anyway we will just wait for the minimum amount and then check for a lockup */
+			 * anyway we will just wait for the minimum amount and then check for a lockup
+			 */
 			timeout = 1;
 		}
-		/* save current sequence value used to check for GPU lockups */
-		seq = atomic64_read(&rdev->fence_drv[fence->ring].last_seq);
+		seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
 		/* Save current last activity valuee, used to check for GPU lockups */
-		last_activity = rdev->fence_drv[fence->ring].last_activity;
-		read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
+		last_activity = rdev->fence_drv[ring].last_activity;
 
 		trace_radeon_fence_wait_begin(rdev->ddev, seq);
-		radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+		radeon_irq_kms_sw_irq_get(rdev, ring);
 		if (intr) {
-			r = wait_event_interruptible_timeout(
-				rdev->fence_drv[fence->ring].queue,
-				(signaled = radeon_fence_signaled(fence)), timeout);
-		} else {
-			r = wait_event_timeout(
-				rdev->fence_drv[fence->ring].queue,
-				(signaled = radeon_fence_signaled(fence)), timeout);
+			r = wait_event_interruptible_timeout(rdev->fence_drv[ring].queue,
+				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
+				timeout);
+                } else {
+			r = wait_event_timeout(rdev->fence_drv[ring].queue,
+				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
+				timeout);
 		}
-		radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+		radeon_irq_kms_sw_irq_put(rdev, ring);
 		if (unlikely(r < 0)) {
 			return r;
 		}
@@ -280,19 +243,24 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 				continue;
 			}
 
-			write_lock_irqsave(&rdev->fence_lock, irq_flags);
+			/* check if sequence value has changed since last_activity */
+			if (seq != atomic64_read(&rdev->fence_drv[ring].last_seq)) {
+				continue;
+			}
 			/* test if somebody else has already decided that this is a lockup */
-			if (last_activity != rdev->fence_drv[fence->ring].last_activity) {
-				write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
+			if (last_activity != rdev->fence_drv[ring].last_activity) {
 				continue;
 			}
 
-			write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-
-			if (radeon_ring_is_lockup(rdev, fence->ring, &rdev->ring[fence->ring])) {
+			if (radeon_ring_is_lockup(rdev, ring, &rdev->ring[ring])) {
 				/* good news we believe it's a lockup */
 				dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx last fence id 0x%016llx)\n",
-					 fence->seq, seq);
+					 target_seq, seq);
+
+				/* change last activity so nobody else think there is a lockup */
+				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+					rdev->fence_drv[i].last_activity = jiffies;
+				}
 
 				/* change last activity so nobody else think there is a lockup */
 				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -300,7 +268,7 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 				}
 
 				/* mark the ring as not ready any more */
-				rdev->ring[fence->ring].ready = false;
+				rdev->ring[ring].ready = false;
 				return -EDEADLK;
 			}
 		}
@@ -308,52 +276,47 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 	return 0;
 }
 
-int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
+int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 {
-	unsigned long irq_flags;
-	struct radeon_fence *fence;
 	int r;
 
-	write_lock_irqsave(&rdev->fence_lock, irq_flags);
-	if (!rdev->ring[ring].ready) {
-		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-		return -EBUSY;
+	if (fence == NULL) {
+		WARN(1, "Querying an invalid fence : %p !\n", fence);
+		return -EINVAL;
 	}
-	if (list_empty(&rdev->fence_drv[ring].emitted)) {
-		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-		return -ENOENT;
+
+	r = radeon_fence_wait_seq(fence->rdev, fence->seq, fence->ring, intr);
+	if (r) {
+		return r;
 	}
-	fence = list_entry(rdev->fence_drv[ring].emitted.next,
-			   struct radeon_fence, list);
-	radeon_fence_ref(fence);
-	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-	r = radeon_fence_wait(fence, false);
-	radeon_fence_unref(&fence);
-	return r;
+	fence->seq = RADEON_FENCE_SIGNALED_SEQ;
+	return 0;
 }
 
-int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
+int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
 {
-	unsigned long irq_flags;
-	struct radeon_fence *fence;
-	int r;
+	uint64_t seq;
 
-	write_lock_irqsave(&rdev->fence_lock, irq_flags);
-	if (!rdev->ring[ring].ready) {
-		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-		return -EBUSY;
-	}
-	if (list_empty(&rdev->fence_drv[ring].emitted)) {
-		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
+	/* We are not protected by ring lock when reading current seq but
+	 * it's ok as worst case is we return to early while we could have
+	 * wait.
+	 */
+	seq = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
+	if (seq >= rdev->fence_drv[ring].seq) {
+		/* nothing to wait for, last_seq is already the last emited fence */
 		return 0;
 	}
-	fence = list_entry(rdev->fence_drv[ring].emitted.prev,
-			   struct radeon_fence, list);
-	radeon_fence_ref(fence);
-	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-	r = radeon_fence_wait(fence, false);
-	radeon_fence_unref(&fence);
-	return r;
+	return radeon_fence_wait_seq(rdev, seq, ring, false);
+}
+
+int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
+{
+	/* We are not protected by ring lock when reading current seq
+	 * but it's ok as wait empty is call from place where no more
+	 * activity can be scheduled so there won't be concurrent access
+	 * to seq value.
+	 */
+	return radeon_fence_wait_seq(rdev, rdev->fence_drv[ring].seq, ring, false);
 }
 
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
@@ -374,47 +337,35 @@ void radeon_fence_unref(struct radeon_fence **fence)
 
 void radeon_fence_process(struct radeon_device *rdev, int ring)
 {
-	unsigned long irq_flags;
 	bool wake;
 
-	write_lock_irqsave(&rdev->fence_lock, irq_flags);
-	wake = radeon_fence_poll_locked(rdev, ring);
-	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
+	wake = radeon_fence_poll(rdev, ring);
 	if (wake) {
 		wake_up_all(&rdev->fence_drv[ring].queue);
 	}
 }
 
-int radeon_fence_count_emitted(struct radeon_device *rdev, int ring)
+unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring)
 {
-	unsigned long irq_flags;
-	int not_processed = 0;
+	uint64_t emitted;
 
-	read_lock_irqsave(&rdev->fence_lock, irq_flags);
-	if (!rdev->fence_drv[ring].initialized) {
-		read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-		return 0;
-	}
-
-	if (!list_empty(&rdev->fence_drv[ring].emitted)) {
-		struct list_head *ptr;
-		list_for_each(ptr, &rdev->fence_drv[ring].emitted) {
-			/* count up to 3, that's enought info */
-			if (++not_processed >= 3)
-				break;
-		}
+	radeon_fence_poll(rdev, ring);
+	/* We are not protected by ring lock when reading the last sequence
+	 * but it's ok to report slightly wrong fence count here.
+	 */
+	emitted = rdev->fence_drv[ring].seq - atomic64_read(&rdev->fence_drv[ring].last_seq);
+	/* to avoid 32bits warp around */
+	if (emitted > 0x10000000) {
+		emitted = 0x10000000;
 	}
-	read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
-	return not_processed;
+	return (unsigned)emitted;
 }
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
 {
-	unsigned long irq_flags;
 	uint64_t index;
 	int r;
 
-	write_lock_irqsave(&rdev->fence_lock, irq_flags);
 	radeon_scratch_free(rdev, rdev->fence_drv[ring].scratch_reg);
 	if (rdev->wb.use_event) {
 		rdev->fence_drv[ring].scratch_reg = 0;
@@ -423,7 +374,6 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
 		r = radeon_scratch_get(rdev, &rdev->fence_drv[ring].scratch_reg);
 		if (r) {
 			dev_err(rdev->dev, "fence failed to get scratch register\n");
-			write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 			return r;
 		}
 		index = RADEON_WB_SCRATCH_OFFSET +
@@ -434,9 +384,8 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
 	rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
 	radeon_fence_write(rdev, rdev->fence_drv[ring].seq, ring);
 	rdev->fence_drv[ring].initialized = true;
-	DRM_INFO("fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n",
+	dev_info(rdev->dev, "fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n",
 		 ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);
-	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 	return 0;
 }
 
@@ -447,22 +396,18 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	rdev->fence_drv[ring].gpu_addr = 0;
 	rdev->fence_drv[ring].seq = 0;
 	atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
-	INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
-	INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
+	rdev->fence_drv[ring].last_activity = jiffies;
 	init_waitqueue_head(&rdev->fence_drv[ring].queue);
 	rdev->fence_drv[ring].initialized = false;
 }
 
 int radeon_fence_driver_init(struct radeon_device *rdev)
 {
-	unsigned long irq_flags;
 	int ring;
 
-	write_lock_irqsave(&rdev->fence_lock, irq_flags);
 	for (ring = 0; ring < RADEON_NUM_RINGS; ring++) {
 		radeon_fence_driver_init_ring(rdev, ring);
 	}
-	write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 	if (radeon_debugfs_fence_init(rdev)) {
 		dev_err(rdev->dev, "fence debugfs file creation failed\n");
 	}
@@ -471,7 +416,6 @@ int radeon_fence_driver_init(struct radeon_device *rdev)
 
 void radeon_fence_driver_fini(struct radeon_device *rdev)
 {
-	unsigned long irq_flags;
 	int ring;
 
 	for (ring = 0; ring < RADEON_NUM_RINGS; ring++) {
@@ -479,9 +423,7 @@ void radeon_fence_driver_fini(struct radeon_device *rdev)
 			continue;
 		radeon_fence_wait_empty(rdev, ring);
 		wake_up_all(&rdev->fence_drv[ring].queue);
-		write_lock_irqsave(&rdev->fence_lock, irq_flags);
 		radeon_scratch_free(rdev, rdev->fence_drv[ring].scratch_reg);
-		write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
 		rdev->fence_drv[ring].initialized = false;
 	}
 }
@@ -496,7 +438,6 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data)
 	struct drm_info_node *node = (struct drm_info_node *)m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct radeon_device *rdev = dev->dev_private;
-	struct radeon_fence *fence;
 	int i;
 
 	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
@@ -506,12 +447,8 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data)
 		seq_printf(m, "--- ring %d ---\n", i);
 		seq_printf(m, "Last signaled fence 0x%016lx\n",
 			   atomic64_read(&rdev->fence_drv[i].last_seq));
-		if (!list_empty(&rdev->fence_drv[i].emitted)) {
-			fence = list_entry(rdev->fence_drv[i].emitted.prev,
-					   struct radeon_fence, list);
-			seq_printf(m, "Last emitted fence %p with 0x%016llx\n",
-				   fence,  fence->seq);
-		}
+		seq_printf(m, "Last emitted  0x%016llx\n",
+			   rdev->fence_drv[i].seq);
 	}
 	return 0;
 }
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 06/20] drm/radeon: rework locking ring emission mutex in fence deadlock detection
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (4 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 05/20] drm/radeon: rework fence handling, drop fence list v5 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 07/20] drm/radeon: use inline functions to calc sa_bo addr Christian König
                   ` (14 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

Some callers illegal called fence_wait_next/empty
while holding the ring emission mutex. So don't
relock the mutex in that cases, and move the actual
locking into the fence code.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h        |    4 +-
 drivers/gpu/drm/radeon/radeon_device.c |    5 +++-
 drivers/gpu/drm/radeon/radeon_fence.c  |   39 ++++++++++++++++++++-----------
 drivers/gpu/drm/radeon/radeon_pm.c     |    8 +-----
 drivers/gpu/drm/radeon/radeon_ring.c   |    6 +----
 5 files changed, 33 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 7c87117..701094b 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -284,8 +284,8 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence);
 void radeon_fence_process(struct radeon_device *rdev, int ring);
 bool radeon_fence_signaled(struct radeon_fence *fence);
 int radeon_fence_wait(struct radeon_fence *fence, bool interruptible);
-int radeon_fence_wait_next(struct radeon_device *rdev, int ring);
-int radeon_fence_wait_empty(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
 unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 0e7b72a..b827b2e 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -912,9 +912,12 @@ int radeon_suspend_kms(struct drm_device *dev, pm_message_t state)
 	}
 	/* evict vram memory */
 	radeon_bo_evict_vram(rdev);
+
+	mutex_lock(&rdev->ring_lock);
 	/* wait for gpu to finish processing current batch */
 	for (i = 0; i < RADEON_NUM_RINGS; i++)
-		radeon_fence_wait_empty(rdev, i);
+		radeon_fence_wait_empty_locked(rdev, i);
+	mutex_unlock(&rdev->ring_lock);
 
 	radeon_save_bios_scratch_regs(rdev);
 
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index f386807..8034b42 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -192,7 +192,7 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 }
 
 static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
-				 unsigned ring, bool intr)
+				 unsigned ring, bool intr, bool lock_ring)
 {
 	unsigned long timeout, last_activity;
 	uint64_t seq;
@@ -247,8 +247,14 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
 			if (seq != atomic64_read(&rdev->fence_drv[ring].last_seq)) {
 				continue;
 			}
+
+			if (lock_ring) {
+				mutex_lock(&rdev->ring_lock);
+			}
+
 			/* test if somebody else has already decided that this is a lockup */
 			if (last_activity != rdev->fence_drv[ring].last_activity) {
+				mutex_unlock(&rdev->ring_lock);
 				continue;
 			}
 
@@ -262,15 +268,15 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
 					rdev->fence_drv[i].last_activity = jiffies;
 				}
 
-				/* change last activity so nobody else think there is a lockup */
-				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
-					rdev->fence_drv[i].last_activity = jiffies;
-				}
-
 				/* mark the ring as not ready any more */
 				rdev->ring[ring].ready = false;
+				mutex_unlock(&rdev->ring_lock);
 				return -EDEADLK;
 			}
+
+			if (lock_ring) {
+				mutex_unlock(&rdev->ring_lock);
+			}
 		}
 	}
 	return 0;
@@ -285,7 +291,8 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 		return -EINVAL;
 	}
 
-	r = radeon_fence_wait_seq(fence->rdev, fence->seq, fence->ring, intr);
+	r = radeon_fence_wait_seq(fence->rdev, fence->seq,
+				  fence->ring, intr, true);
 	if (r) {
 		return r;
 	}
@@ -293,7 +300,7 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 	return 0;
 }
 
-int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
+int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring)
 {
 	uint64_t seq;
 
@@ -303,20 +310,22 @@ int radeon_fence_wait_next(struct radeon_device *rdev, int ring)
 	 */
 	seq = atomic64_read(&rdev->fence_drv[ring].last_seq) + 1ULL;
 	if (seq >= rdev->fence_drv[ring].seq) {
-		/* nothing to wait for, last_seq is already the last emited fence */
-		return 0;
+		/* nothing to wait for, last_seq is
+		   already the last emited fence */
+		return -ENOENT;
 	}
-	return radeon_fence_wait_seq(rdev, seq, ring, false);
+	return radeon_fence_wait_seq(rdev, seq, ring, false, false);
 }
 
-int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
+int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring)
 {
 	/* We are not protected by ring lock when reading current seq
 	 * but it's ok as wait empty is call from place where no more
 	 * activity can be scheduled so there won't be concurrent access
 	 * to seq value.
 	 */
-	return radeon_fence_wait_seq(rdev, rdev->fence_drv[ring].seq, ring, false);
+	return radeon_fence_wait_seq(rdev, rdev->fence_drv[ring].seq,
+				     ring, false, false);
 }
 
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
@@ -418,14 +427,16 @@ void radeon_fence_driver_fini(struct radeon_device *rdev)
 {
 	int ring;
 
+	mutex_lock(&rdev->ring_lock);
 	for (ring = 0; ring < RADEON_NUM_RINGS; ring++) {
 		if (!rdev->fence_drv[ring].initialized)
 			continue;
-		radeon_fence_wait_empty(rdev, ring);
+		radeon_fence_wait_empty_locked(rdev, ring);
 		wake_up_all(&rdev->fence_drv[ring].queue);
 		radeon_scratch_free(rdev, rdev->fence_drv[ring].scratch_reg);
 		rdev->fence_drv[ring].initialized = false;
 	}
+	mutex_unlock(&rdev->ring_lock);
 }
 
 
diff --git a/drivers/gpu/drm/radeon/radeon_pm.c b/drivers/gpu/drm/radeon/radeon_pm.c
index 7c38745..0882554 100644
--- a/drivers/gpu/drm/radeon/radeon_pm.c
+++ b/drivers/gpu/drm/radeon/radeon_pm.c
@@ -270,13 +270,7 @@ static void radeon_pm_set_clocks(struct radeon_device *rdev)
 	} else {
 		struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX];
 		if (ring->ready) {
-			struct radeon_fence *fence;
-			radeon_ring_alloc(rdev, ring, 64);
-			radeon_fence_create(rdev, &fence, radeon_ring_index(rdev, ring));
-			radeon_fence_emit(rdev, fence);
-			radeon_ring_commit(rdev, ring);
-			radeon_fence_wait(fence, false);
-			radeon_fence_unref(&fence);
+			radeon_fence_wait_empty_locked(rdev, RADEON_RING_TYPE_GFX_INDEX);
 		}
 	}
 	radeon_unmap_vram_bos(rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 4ae222b..2fdc8c3 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -347,9 +347,7 @@ int radeon_ring_alloc(struct radeon_device *rdev, struct radeon_ring *ring, unsi
 		if (ndw < ring->ring_free_dw) {
 			break;
 		}
-		mutex_unlock(&rdev->ring_lock);
-		r = radeon_fence_wait_next(rdev, radeon_ring_index(rdev, ring));
-		mutex_lock(&rdev->ring_lock);
+		r = radeon_fence_wait_next_locked(rdev, radeon_ring_index(rdev, ring));
 		if (r)
 			return r;
 	}
@@ -408,7 +406,6 @@ void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *
 {
 	int r;
 
-	mutex_lock(&rdev->ring_lock);
 	radeon_ring_free_size(rdev, ring);
 	if (ring->rptr == ring->wptr) {
 		r = radeon_ring_alloc(rdev, ring, 1);
@@ -417,7 +414,6 @@ void radeon_ring_force_activity(struct radeon_device *rdev, struct radeon_ring *
 			radeon_ring_commit(rdev, ring);
 		}
 	}
-	mutex_unlock(&rdev->ring_lock);
 }
 
 void radeon_ring_lockup_update(struct radeon_ring *ring)
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 07/20] drm/radeon: use inline functions to calc sa_bo addr
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (5 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 06/20] drm/radeon: rework locking ring emission mutex in fence deadlock detection Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 08/20] drm/radeon: add proper locking to the SA v3 Christian König
                   ` (13 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

Instead of hacking the calculation multiple times.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon_gart.c      |    6 ++----
 drivers/gpu/drm/radeon/radeon_object.h    |   11 +++++++++++
 drivers/gpu/drm/radeon/radeon_ring.c      |    6 ++----
 drivers/gpu/drm/radeon/radeon_semaphore.c |    6 ++----
 4 files changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c
index c58a036..4a5d9d4 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -404,10 +404,8 @@ retry:
 		radeon_vm_unbind(rdev, vm_evict);
 		goto retry;
 	}
-	vm->pt = rdev->vm_manager.sa_manager.cpu_ptr;
-	vm->pt += (vm->sa_bo.offset >> 3);
-	vm->pt_gpu_addr = rdev->vm_manager.sa_manager.gpu_addr;
-	vm->pt_gpu_addr += vm->sa_bo.offset;
+	vm->pt = radeon_sa_bo_cpu_addr(&vm->sa_bo);
+	vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(&vm->sa_bo);
 	memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8));
 
 retry_id:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index f9104be..c120ab9 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -146,6 +146,17 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo,
 /*
  * sub allocation
  */
+
+static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo)
+{
+	return sa_bo->manager->gpu_addr + sa_bo->offset;
+}
+
+static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo)
+{
+	return sa_bo->manager->cpu_ptr + sa_bo->offset;
+}
+
 extern int radeon_sa_bo_manager_init(struct radeon_device *rdev,
 				     struct radeon_sa_manager *sa_manager,
 				     unsigned size, u32 domain);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 2fdc8c3..116be5e 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -127,10 +127,8 @@ retry:
 					     size, 256);
 			if (!r) {
 				*ib = &rdev->ib_pool.ibs[idx];
-				(*ib)->ptr = rdev->ib_pool.sa_manager.cpu_ptr;
-				(*ib)->ptr += ((*ib)->sa_bo.offset >> 2);
-				(*ib)->gpu_addr = rdev->ib_pool.sa_manager.gpu_addr;
-				(*ib)->gpu_addr += (*ib)->sa_bo.offset;
+				(*ib)->ptr = radeon_sa_bo_cpu_addr(&(*ib)->sa_bo);
+				(*ib)->gpu_addr = radeon_sa_bo_gpu_addr(&(*ib)->sa_bo);
 				(*ib)->fence = fence;
 				(*ib)->vm_id = 0;
 				(*ib)->is_const_ib = false;
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c
index c5b3d8e..f312ba5 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -53,10 +53,8 @@ static int radeon_semaphore_add_bo(struct radeon_device *rdev)
 		kfree(bo);
 		return r;
 	}
-	gpu_addr = rdev->ib_pool.sa_manager.gpu_addr;
-	gpu_addr += bo->ib->sa_bo.offset;
-	cpu_ptr = rdev->ib_pool.sa_manager.cpu_ptr;
-	cpu_ptr += (bo->ib->sa_bo.offset >> 2);
+	gpu_addr = radeon_sa_bo_gpu_addr(&bo->ib->sa_bo);
+	cpu_ptr = radeon_sa_bo_cpu_addr(&bo->ib->sa_bo);
 	for (i = 0; i < (RADEON_SEMAPHORE_BO_SIZE/8); i++) {
 		bo->semaphores[i].gpu_addr = gpu_addr;
 		bo->semaphores[i].cpu_ptr = cpu_ptr;
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 08/20] drm/radeon: add proper locking to the SA v3
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (6 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 07/20] drm/radeon: use inline functions to calc sa_bo addr Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 09/20] drm/radeon: add sub allocator debugfs file Christian König
                   ` (12 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

Make the suballocator self containing to locking.

v2: split the bugfix into a seperate patch.
v3: remove some unreleated changes.

Sig-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h    |    1 +
 drivers/gpu/drm/radeon/radeon_sa.c |    6 ++++++
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 701094b..8a6b1b3 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -381,6 +381,7 @@ struct radeon_bo_list {
  * alignment).
  */
 struct radeon_sa_manager {
+	spinlock_t		lock;
 	struct radeon_bo	*bo;
 	struct list_head	sa_bo;
 	unsigned		size;
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c
index 8fbfe69..aed0a8c 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -37,6 +37,7 @@ int radeon_sa_bo_manager_init(struct radeon_device *rdev,
 {
 	int r;
 
+	spin_lock_init(&sa_manager->lock);
 	sa_manager->bo = NULL;
 	sa_manager->size = size;
 	sa_manager->domain = domain;
@@ -139,6 +140,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 
 	BUG_ON(align > RADEON_GPU_PAGE_SIZE);
 	BUG_ON(size > sa_manager->size);
+	spin_lock(&sa_manager->lock);
 
 	/* no one ? */
 	head = sa_manager->sa_bo.prev;
@@ -172,6 +174,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 	offset += wasted;
 	if ((sa_manager->size - offset) < size) {
 		/* failed to find somethings big enough */
+		spin_unlock(&sa_manager->lock);
 		return -ENOMEM;
 	}
 
@@ -180,10 +183,13 @@ out:
 	sa_bo->offset = offset;
 	sa_bo->size = size;
 	list_add(&sa_bo->list, head);
+	spin_unlock(&sa_manager->lock);
 	return 0;
 }
 
 void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo)
 {
+	spin_lock(&sa_bo->manager->lock);
 	list_del_init(&sa_bo->list);
+	spin_unlock(&sa_bo->manager->lock);
 }
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 09/20] drm/radeon: add sub allocator debugfs file
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (7 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 08/20] drm/radeon: add proper locking to the SA v3 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 10/20] drm/radeon: keep start and end offset in the SA Christian König
                   ` (11 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

Dumping the current allocations.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon_object.h |    5 +++++
 drivers/gpu/drm/radeon/radeon_ring.c   |   22 ++++++++++++++++++++++
 drivers/gpu/drm/radeon/radeon_sa.c     |   14 ++++++++++++++
 3 files changed, 41 insertions(+), 0 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index c120ab9..d9fca1e 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -172,5 +172,10 @@ extern int radeon_sa_bo_new(struct radeon_device *rdev,
 			    unsigned size, unsigned align);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
 			      struct radeon_sa_bo *sa_bo);
+#if defined(CONFIG_DEBUG_FS)
+extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
+					 struct seq_file *m);
+#endif
+
 
 #endif
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 116be5e..f49c9c0 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -601,6 +601,23 @@ static int radeon_debugfs_ib_info(struct seq_file *m, void *data)
 static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE];
 static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32];
 static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE];
+
+static int radeon_debugfs_sa_info(struct seq_file *m, void *data)
+{
+	struct drm_info_node *node = (struct drm_info_node *) m->private;
+	struct drm_device *dev = node->minor->dev;
+	struct radeon_device *rdev = dev->dev_private;
+
+	radeon_sa_bo_dump_debug_info(&rdev->ib_pool.sa_manager, m);
+
+	return 0;
+
+}
+
+static struct drm_info_list radeon_debugfs_sa_list[] = {
+        {"radeon_sa_info", &radeon_debugfs_sa_info, 0, NULL},
+};
+
 #endif
 
 int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *ring)
@@ -627,6 +644,11 @@ int radeon_debugfs_ib_init(struct radeon_device *rdev)
 {
 #if defined(CONFIG_DEBUG_FS)
 	unsigned i;
+	int r;
+
+	r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1);
+	if (r)
+		return r;
 
 	for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
 		sprintf(radeon_debugfs_ib_names[i], "radeon_ib_%04u", i);
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c
index aed0a8c..1db0568 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -193,3 +193,17 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo)
 	list_del_init(&sa_bo->list);
 	spin_unlock(&sa_bo->manager->lock);
 }
+
+#if defined(CONFIG_DEBUG_FS)
+void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
+				  struct seq_file *m)
+{
+	struct radeon_sa_bo *i;
+
+	spin_lock(&sa_manager->lock);
+	list_for_each_entry(i, &sa_manager->sa_bo, list) {
+		seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size);
+	}
+	spin_unlock(&sa_manager->lock);
+}
+#endif
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 10/20] drm/radeon: keep start and end offset in the SA
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (8 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 09/20] drm/radeon: add sub allocator debugfs file Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 11/20] drm/radeon: make sa bo a stand alone object Christian König
                   ` (10 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

Instead of offset + size keep start and end offset directly.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h        |    4 ++--
 drivers/gpu/drm/radeon/radeon_cs.c     |    4 ++--
 drivers/gpu/drm/radeon/radeon_object.h |    4 ++--
 drivers/gpu/drm/radeon/radeon_sa.c     |   13 +++++++------
 4 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 8a6b1b3..d1c2154 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -396,8 +396,8 @@ struct radeon_sa_bo;
 struct radeon_sa_bo {
 	struct list_head		list;
 	struct radeon_sa_manager	*manager;
-	unsigned			offset;
-	unsigned			size;
+	unsigned			soffset;
+	unsigned			eoffset;
 };
 
 /*
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 289b0d7..b778037 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 		/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 		 * offset inside the pool bo
 		 */
-		parser->const_ib->gpu_addr = parser->const_ib->sa_bo.offset;
+		parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset;
 		r = radeon_ib_schedule(rdev, parser->const_ib);
 		if (r)
 			goto out;
@@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 	/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 	 * offset inside the pool bo
 	 */
-	parser->ib->gpu_addr = parser->ib->sa_bo.offset;
+	parser->ib->gpu_addr = parser->ib->sa_bo.soffset;
 	parser->ib->is_const_ib = false;
 	r = radeon_ib_schedule(rdev, parser->ib);
 out:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index d9fca1e..99ab46a 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -149,12 +149,12 @@ extern struct radeon_bo_va *radeon_bo_va(struct radeon_bo *rbo,
 
 static inline uint64_t radeon_sa_bo_gpu_addr(struct radeon_sa_bo *sa_bo)
 {
-	return sa_bo->manager->gpu_addr + sa_bo->offset;
+	return sa_bo->manager->gpu_addr + sa_bo->soffset;
 }
 
 static inline void * radeon_sa_bo_cpu_addr(struct radeon_sa_bo *sa_bo)
 {
-	return sa_bo->manager->cpu_ptr + sa_bo->offset;
+	return sa_bo->manager->cpu_ptr + sa_bo->soffset;
 }
 
 extern int radeon_sa_bo_manager_init(struct radeon_device *rdev,
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c
index 1db0568..3bea7ba 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -152,11 +152,11 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 	offset = 0;
 	list_for_each_entry(tmp, &sa_manager->sa_bo, list) {
 		/* room before this object ? */
-		if (offset < tmp->offset && (tmp->offset - offset) >= size) {
+		if (offset < tmp->soffset && (tmp->soffset - offset) >= size) {
 			head = tmp->list.prev;
 			goto out;
 		}
-		offset = tmp->offset + tmp->size;
+		offset = tmp->eoffset;
 		wasted = offset % align;
 		if (wasted) {
 			wasted = align - wasted;
@@ -166,7 +166,7 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 	/* room at the end ? */
 	head = sa_manager->sa_bo.prev;
 	tmp = list_entry(head, struct radeon_sa_bo, list);
-	offset = tmp->offset + tmp->size;
+	offset = tmp->eoffset;
 	wasted = offset % align;
 	if (wasted) {
 		wasted = align - wasted;
@@ -180,8 +180,8 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 
 out:
 	sa_bo->manager = sa_manager;
-	sa_bo->offset = offset;
-	sa_bo->size = size;
+	sa_bo->soffset = offset;
+	sa_bo->eoffset = offset + size;
 	list_add(&sa_bo->list, head);
 	spin_unlock(&sa_manager->lock);
 	return 0;
@@ -202,7 +202,8 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 
 	spin_lock(&sa_manager->lock);
 	list_for_each_entry(i, &sa_manager->sa_bo, list) {
-		seq_printf(m, "offset %08d: size %4d\n", i->offset, i->size);
+		seq_printf(m, "[%08x %08x] size %4d [%p]\n",
+			   i->soffset, i->eoffset, i->eoffset - i->soffset, i);
 	}
 	spin_unlock(&sa_manager->lock);
 }
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 11/20] drm/radeon: make sa bo a stand alone object
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (9 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 10/20] drm/radeon: keep start and end offset in the SA Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 12/20] drm/radeon: define new SA interface v3 Christian König
                   ` (9 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

Allocating and freeing it seperately.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h           |    4 ++--
 drivers/gpu/drm/radeon/radeon_cs.c        |    4 ++--
 drivers/gpu/drm/radeon/radeon_gart.c      |    4 ++--
 drivers/gpu/drm/radeon/radeon_object.h    |    4 ++--
 drivers/gpu/drm/radeon/radeon_ring.c      |    6 +++---
 drivers/gpu/drm/radeon/radeon_sa.c        |   28 +++++++++++++++++++---------
 drivers/gpu/drm/radeon/radeon_semaphore.c |    4 ++--
 7 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index d1c2154..9374ab1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -638,7 +638,7 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc);
  */
 
 struct radeon_ib {
-	struct radeon_sa_bo	sa_bo;
+	struct radeon_sa_bo	*sa_bo;
 	unsigned		idx;
 	uint32_t		length_dw;
 	uint64_t		gpu_addr;
@@ -693,7 +693,7 @@ struct radeon_vm {
 	unsigned			last_pfn;
 	u64				pt_gpu_addr;
 	u64				*pt;
-	struct radeon_sa_bo		sa_bo;
+	struct radeon_sa_bo		*sa_bo;
 	struct mutex			mutex;
 	/* last fence for cs using this vm */
 	struct radeon_fence		*fence;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index b778037..5c065bf 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -477,7 +477,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 		/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 		 * offset inside the pool bo
 		 */
-		parser->const_ib->gpu_addr = parser->const_ib->sa_bo.soffset;
+		parser->const_ib->gpu_addr = parser->const_ib->sa_bo->soffset;
 		r = radeon_ib_schedule(rdev, parser->const_ib);
 		if (r)
 			goto out;
@@ -487,7 +487,7 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 	/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 	 * offset inside the pool bo
 	 */
-	parser->ib->gpu_addr = parser->ib->sa_bo.soffset;
+	parser->ib->gpu_addr = parser->ib->sa_bo->soffset;
 	parser->ib->is_const_ib = false;
 	r = radeon_ib_schedule(rdev, parser->ib);
 out:
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c
index 4a5d9d4..c5789ef 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -404,8 +404,8 @@ retry:
 		radeon_vm_unbind(rdev, vm_evict);
 		goto retry;
 	}
-	vm->pt = radeon_sa_bo_cpu_addr(&vm->sa_bo);
-	vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(&vm->sa_bo);
+	vm->pt = radeon_sa_bo_cpu_addr(vm->sa_bo);
+	vm->pt_gpu_addr = radeon_sa_bo_gpu_addr(vm->sa_bo);
 	memset(vm->pt, 0, RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8));
 
 retry_id:
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 99ab46a..4fc7f07 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -168,10 +168,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev,
 					struct radeon_sa_manager *sa_manager);
 extern int radeon_sa_bo_new(struct radeon_device *rdev,
 			    struct radeon_sa_manager *sa_manager,
-			    struct radeon_sa_bo *sa_bo,
+			    struct radeon_sa_bo **sa_bo,
 			    unsigned size, unsigned align);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
-			      struct radeon_sa_bo *sa_bo);
+			      struct radeon_sa_bo **sa_bo);
 #if defined(CONFIG_DEBUG_FS)
 extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 					 struct seq_file *m);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index f49c9c0..45adb37 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -127,8 +127,8 @@ retry:
 					     size, 256);
 			if (!r) {
 				*ib = &rdev->ib_pool.ibs[idx];
-				(*ib)->ptr = radeon_sa_bo_cpu_addr(&(*ib)->sa_bo);
-				(*ib)->gpu_addr = radeon_sa_bo_gpu_addr(&(*ib)->sa_bo);
+				(*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo);
+				(*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
 				(*ib)->fence = fence;
 				(*ib)->vm_id = 0;
 				(*ib)->is_const_ib = false;
@@ -227,7 +227,7 @@ int radeon_ib_pool_init(struct radeon_device *rdev)
 		rdev->ib_pool.ibs[i].fence = NULL;
 		rdev->ib_pool.ibs[i].idx = i;
 		rdev->ib_pool.ibs[i].length_dw = 0;
-		INIT_LIST_HEAD(&rdev->ib_pool.ibs[i].sa_bo.list);
+		rdev->ib_pool.ibs[i].sa_bo = NULL;
 	}
 	rdev->ib_pool.head_id = 0;
 	rdev->ib_pool.ready = true;
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c
index 3bea7ba..625f2d4 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -131,7 +131,7 @@ int radeon_sa_bo_manager_suspend(struct radeon_device *rdev,
  */
 int radeon_sa_bo_new(struct radeon_device *rdev,
 		     struct radeon_sa_manager *sa_manager,
-		     struct radeon_sa_bo *sa_bo,
+		     struct radeon_sa_bo **sa_bo,
 		     unsigned size, unsigned align)
 {
 	struct radeon_sa_bo *tmp;
@@ -140,6 +140,9 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 
 	BUG_ON(align > RADEON_GPU_PAGE_SIZE);
 	BUG_ON(size > sa_manager->size);
+
+	*sa_bo = kmalloc(sizeof(struct radeon_sa_bo), GFP_KERNEL);
+
 	spin_lock(&sa_manager->lock);
 
 	/* no one ? */
@@ -175,23 +178,30 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 	if ((sa_manager->size - offset) < size) {
 		/* failed to find somethings big enough */
 		spin_unlock(&sa_manager->lock);
+		kfree(*sa_bo);
+		*sa_bo = NULL;
 		return -ENOMEM;
 	}
 
 out:
-	sa_bo->manager = sa_manager;
-	sa_bo->soffset = offset;
-	sa_bo->eoffset = offset + size;
-	list_add(&sa_bo->list, head);
+	(*sa_bo)->manager = sa_manager;
+	(*sa_bo)->soffset = offset;
+	(*sa_bo)->eoffset = offset + size;
+	list_add(&(*sa_bo)->list, head);
 	spin_unlock(&sa_manager->lock);
 	return 0;
 }
 
-void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo *sa_bo)
+void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo)
 {
-	spin_lock(&sa_bo->manager->lock);
-	list_del_init(&sa_bo->list);
-	spin_unlock(&sa_bo->manager->lock);
+	if (!sa_bo || !*sa_bo)
+		return;
+
+	spin_lock(&(*sa_bo)->manager->lock);
+	list_del_init(&(*sa_bo)->list);
+	spin_unlock(&(*sa_bo)->manager->lock);
+	kfree(*sa_bo);
+	*sa_bo = NULL;
 }
 
 #if defined(CONFIG_DEBUG_FS)
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c
index f312ba5..d518d32 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -53,8 +53,8 @@ static int radeon_semaphore_add_bo(struct radeon_device *rdev)
 		kfree(bo);
 		return r;
 	}
-	gpu_addr = radeon_sa_bo_gpu_addr(&bo->ib->sa_bo);
-	cpu_ptr = radeon_sa_bo_cpu_addr(&bo->ib->sa_bo);
+	gpu_addr = radeon_sa_bo_gpu_addr(bo->ib->sa_bo);
+	cpu_ptr = radeon_sa_bo_cpu_addr(bo->ib->sa_bo);
 	for (i = 0; i < (RADEON_SEMAPHORE_BO_SIZE/8); i++) {
 		bo->semaphores[i].gpu_addr = gpu_addr;
 		bo->semaphores[i].cpu_ptr = cpu_ptr;
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 12/20] drm/radeon: define new SA interface v3
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (10 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 11/20] drm/radeon: make sa bo a stand alone object Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 13/20] drm/radeon: use one wait queue for all rings add fence_wait_any v2 Christian König
                   ` (8 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König, Jerome Glisse

Define the interface without modifying the allocation
algorithm in any way.

v2: rebase on top of fence new uint64 patch
v3: add ring to debugfs output

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h           |    1 +
 drivers/gpu/drm/radeon/radeon_gart.c      |    6 +-
 drivers/gpu/drm/radeon/radeon_object.h    |    5 +-
 drivers/gpu/drm/radeon/radeon_ring.c      |    8 ++--
 drivers/gpu/drm/radeon/radeon_sa.c        |   60 ++++++++++++++++++++++++----
 drivers/gpu/drm/radeon/radeon_semaphore.c |    2 +-
 6 files changed, 63 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 9374ab1..ada70d1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -398,6 +398,7 @@ struct radeon_sa_bo {
 	struct radeon_sa_manager	*manager;
 	unsigned			soffset;
 	unsigned			eoffset;
+	struct radeon_fence		*fence;
 };
 
 /*
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c
index c5789ef..53dba8e 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -326,7 +326,7 @@ static void radeon_vm_unbind_locked(struct radeon_device *rdev,
 	rdev->vm_manager.use_bitmap &= ~(1 << vm->id);
 	list_del_init(&vm->list);
 	vm->id = -1;
-	radeon_sa_bo_free(rdev, &vm->sa_bo);
+	radeon_sa_bo_free(rdev, &vm->sa_bo, NULL);
 	vm->pt = NULL;
 
 	list_for_each_entry(bo_va, &vm->va, vm_list) {
@@ -395,7 +395,7 @@ int radeon_vm_bind(struct radeon_device *rdev, struct radeon_vm *vm)
 retry:
 	r = radeon_sa_bo_new(rdev, &rdev->vm_manager.sa_manager, &vm->sa_bo,
 			     RADEON_GPU_PAGE_ALIGN(vm->last_pfn * 8),
-			     RADEON_GPU_PAGE_SIZE);
+			     RADEON_GPU_PAGE_SIZE, false);
 	if (r) {
 		if (list_empty(&rdev->vm_manager.lru_vm)) {
 			return r;
@@ -426,7 +426,7 @@ retry_id:
 	/* do hw bind */
 	r = rdev->vm_manager.funcs->bind(rdev, vm, id);
 	if (r) {
-		radeon_sa_bo_free(rdev, &vm->sa_bo);
+		radeon_sa_bo_free(rdev, &vm->sa_bo, NULL);
 		return r;
 	}
 	rdev->vm_manager.use_bitmap |= 1 << id;
diff --git a/drivers/gpu/drm/radeon/radeon_object.h b/drivers/gpu/drm/radeon/radeon_object.h
index 4fc7f07..befec7d 100644
--- a/drivers/gpu/drm/radeon/radeon_object.h
+++ b/drivers/gpu/drm/radeon/radeon_object.h
@@ -169,9 +169,10 @@ extern int radeon_sa_bo_manager_suspend(struct radeon_device *rdev,
 extern int radeon_sa_bo_new(struct radeon_device *rdev,
 			    struct radeon_sa_manager *sa_manager,
 			    struct radeon_sa_bo **sa_bo,
-			    unsigned size, unsigned align);
+			    unsigned size, unsigned align, bool block);
 extern void radeon_sa_bo_free(struct radeon_device *rdev,
-			      struct radeon_sa_bo **sa_bo);
+			      struct radeon_sa_bo **sa_bo,
+			      struct radeon_fence *fence);
 #if defined(CONFIG_DEBUG_FS)
 extern void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 					 struct seq_file *m);
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 45adb37..1748d93 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -85,7 +85,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib)
 	if (ib->fence && ib->fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
 		if (radeon_fence_signaled(ib->fence)) {
 			radeon_fence_unref(&ib->fence);
-			radeon_sa_bo_free(rdev, &ib->sa_bo);
+			radeon_sa_bo_free(rdev, &ib->sa_bo, NULL);
 			done = true;
 		}
 	}
@@ -124,7 +124,7 @@ retry:
 		if (rdev->ib_pool.ibs[idx].fence == NULL) {
 			r = radeon_sa_bo_new(rdev, &rdev->ib_pool.sa_manager,
 					     &rdev->ib_pool.ibs[idx].sa_bo,
-					     size, 256);
+					     size, 256, false);
 			if (!r) {
 				*ib = &rdev->ib_pool.ibs[idx];
 				(*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo);
@@ -173,7 +173,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
 	}
 	radeon_mutex_lock(&rdev->ib_pool.mutex);
 	if (tmp->fence && tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
-		radeon_sa_bo_free(rdev, &tmp->sa_bo);
+		radeon_sa_bo_free(rdev, &tmp->sa_bo, NULL);
 		radeon_fence_unref(&tmp->fence);
 	}
 	radeon_mutex_unlock(&rdev->ib_pool.mutex);
@@ -247,7 +247,7 @@ void radeon_ib_pool_fini(struct radeon_device *rdev)
 	radeon_mutex_lock(&rdev->ib_pool.mutex);
 	if (rdev->ib_pool.ready) {
 		for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
-			radeon_sa_bo_free(rdev, &rdev->ib_pool.ibs[i].sa_bo);
+			radeon_sa_bo_free(rdev, &rdev->ib_pool.ibs[i].sa_bo, NULL);
 			radeon_fence_unref(&rdev->ib_pool.ibs[i].fence);
 		}
 		radeon_sa_bo_manager_fini(rdev, &rdev->ib_pool.sa_manager);
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c
index 625f2d4..90ee8ad 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -129,20 +129,32 @@ int radeon_sa_bo_manager_suspend(struct radeon_device *rdev,
  *
  * Alignment can't be bigger than page size
  */
+
+static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo)
+{
+	list_del(&sa_bo->list);
+	radeon_fence_unref(&sa_bo->fence);
+	kfree(sa_bo);
+}
+
 int radeon_sa_bo_new(struct radeon_device *rdev,
 		     struct radeon_sa_manager *sa_manager,
 		     struct radeon_sa_bo **sa_bo,
-		     unsigned size, unsigned align)
+		     unsigned size, unsigned align, bool block)
 {
-	struct radeon_sa_bo *tmp;
+	struct radeon_fence *fence = NULL;
+	struct radeon_sa_bo *tmp, *next;
 	struct list_head *head;
 	unsigned offset = 0, wasted = 0;
+	int r;
 
 	BUG_ON(align > RADEON_GPU_PAGE_SIZE);
 	BUG_ON(size > sa_manager->size);
 
 	*sa_bo = kmalloc(sizeof(struct radeon_sa_bo), GFP_KERNEL);
 
+retry:
+
 	spin_lock(&sa_manager->lock);
 
 	/* no one ? */
@@ -153,7 +165,17 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 
 	/* look for a hole big enough */
 	offset = 0;
-	list_for_each_entry(tmp, &sa_manager->sa_bo, list) {
+	list_for_each_entry_safe(tmp, next, &sa_manager->sa_bo, list) {
+		/* try to free this object */
+		if (tmp->fence) {
+			if (radeon_fence_signaled(tmp->fence)) {
+				radeon_sa_bo_remove_locked(tmp);
+				continue;
+			} else {
+				fence = tmp->fence;
+			}
+		}
+
 		/* room before this object ? */
 		if (offset < tmp->soffset && (tmp->soffset - offset) >= size) {
 			head = tmp->list.prev;
@@ -178,6 +200,13 @@ int radeon_sa_bo_new(struct radeon_device *rdev,
 	if ((sa_manager->size - offset) < size) {
 		/* failed to find somethings big enough */
 		spin_unlock(&sa_manager->lock);
+		if (block && fence) {
+			r = radeon_fence_wait(fence, false);
+			if (r)
+				return r;
+
+			goto retry;
+		}
 		kfree(*sa_bo);
 		*sa_bo = NULL;
 		return -ENOMEM;
@@ -192,15 +221,22 @@ out:
 	return 0;
 }
 
-void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo)
+void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo,
+		       struct radeon_fence *fence)
 {
+	struct radeon_sa_manager *sa_manager;
+
 	if (!sa_bo || !*sa_bo)
 		return;
 
-	spin_lock(&(*sa_bo)->manager->lock);
-	list_del_init(&(*sa_bo)->list);
-	spin_unlock(&(*sa_bo)->manager->lock);
-	kfree(*sa_bo);
+	sa_manager = (*sa_bo)->manager;
+	spin_lock(&sa_manager->lock);
+	if (fence && fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
+		(*sa_bo)->fence = radeon_fence_ref(fence);
+	} else {
+		radeon_sa_bo_remove_locked(*sa_bo);
+	}
+	spin_unlock(&sa_manager->lock);
 	*sa_bo = NULL;
 }
 
@@ -212,8 +248,14 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 
 	spin_lock(&sa_manager->lock);
 	list_for_each_entry(i, &sa_manager->sa_bo, list) {
-		seq_printf(m, "[%08x %08x] size %4d [%p]\n",
+		seq_printf(m, "[%08x %08x] size %4d (%p)",
 			   i->soffset, i->eoffset, i->eoffset - i->soffset, i);
+		if (i->fence) {
+			seq_printf(m, " protected by %Ld (%p) on ring %d\n",
+				   i->fence->seq, i->fence, i->fence->ring);
+		} else {
+			seq_printf(m, "\n");
+		}
 	}
 	spin_unlock(&sa_manager->lock);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c
index d518d32..dbde874 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -72,7 +72,7 @@ static int radeon_semaphore_add_bo(struct radeon_device *rdev)
 static void radeon_semaphore_del_bo_locked(struct radeon_device *rdev,
 					   struct radeon_semaphore_bo *bo)
 {
-	radeon_sa_bo_free(rdev, &bo->ib->sa_bo);
+	radeon_sa_bo_free(rdev, &bo->ib->sa_bo, NULL);
 	radeon_fence_unref(&bo->ib->fence);
 	list_del(&bo->list);
 	kfree(bo);
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 13/20] drm/radeon: use one wait queue for all rings add fence_wait_any v2
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (11 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 12/20] drm/radeon: define new SA interface v3 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 14/20] drm/radeon: multiple ring allocator v2 Christian König
                   ` (7 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse, Christian König

From: Jerome Glisse <jglisse@redhat.com>

Use one wait queue for all rings. When one ring progress, other
likely does to and we are not expecting to have a lot of waiter
anyway.

Also add a fence_wait_any that will wait until the first fence
in the fence array (one fence per ring) is signaled. This allow
to wait on all rings.

v2: some minor cleanups and improvements.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon.h       |    5 +-
 drivers/gpu/drm/radeon/radeon_fence.c |  163 ++++++++++++++++++++++++++++++++-
 2 files changed, 162 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index ada70d1..37a7459 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -262,7 +262,6 @@ struct radeon_fence_driver {
 	uint64_t			seq;
 	atomic64_t			last_seq;
 	unsigned long			last_activity;
-	wait_queue_head_t		queue;
 	bool				initialized;
 };
 
@@ -286,6 +285,9 @@ bool radeon_fence_signaled(struct radeon_fence *fence);
 int radeon_fence_wait(struct radeon_fence *fence, bool interruptible);
 int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring);
 int radeon_fence_wait_empty_locked(struct radeon_device *rdev, int ring);
+int radeon_fence_wait_any(struct radeon_device *rdev,
+			  struct radeon_fence **fences,
+			  bool intr);
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence);
 void radeon_fence_unref(struct radeon_fence **fence);
 unsigned radeon_fence_count_emitted(struct radeon_device *rdev, int ring);
@@ -1534,6 +1536,7 @@ struct radeon_device {
 	struct radeon_scratch		scratch;
 	struct radeon_mman		mman;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
+	wait_queue_head_t		fence_queue;
 	struct radeon_semaphore_driver	semaphore_drv;
 	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 8034b42..45d4e6e 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -222,11 +222,11 @@ static int radeon_fence_wait_seq(struct radeon_device *rdev, u64 target_seq,
 		trace_radeon_fence_wait_begin(rdev->ddev, seq);
 		radeon_irq_kms_sw_irq_get(rdev, ring);
 		if (intr) {
-			r = wait_event_interruptible_timeout(rdev->fence_drv[ring].queue,
+			r = wait_event_interruptible_timeout(rdev->fence_queue,
 				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
 				timeout);
                 } else {
-			r = wait_event_timeout(rdev->fence_drv[ring].queue,
+			r = wait_event_timeout(rdev->fence_queue,
 				(signaled = radeon_fence_seq_signaled(rdev, target_seq, ring)),
 				timeout);
 		}
@@ -300,6 +300,159 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 	return 0;
 }
 
+bool radeon_fence_any_seq_signaled(struct radeon_device *rdev, u64 *seq)
+{
+	unsigned i;
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (seq[i] && radeon_fence_seq_signaled(rdev, seq[i], i)) {
+			return true;
+		}
+	}
+	return false;
+}
+
+static int radeon_fence_wait_any_seq(struct radeon_device *rdev,
+				     u64 *target_seq, bool intr)
+{
+	unsigned long timeout, last_activity, tmp;
+	unsigned i, ring = RADEON_NUM_RINGS;
+	bool signaled;
+	int r;
+
+	for (i = 0, last_activity = 0; i < RADEON_NUM_RINGS; ++i) {
+		if (!target_seq[i]) {
+			continue;
+		}
+
+		/* use the most recent one as indicator */
+		if (time_after(rdev->fence_drv[i].last_activity, last_activity)) {
+			last_activity = rdev->fence_drv[i].last_activity;
+		}
+
+		/* For lockup detection just pick the lowest ring we are
+		 * actively waiting for
+		 */
+		if (i < ring) {
+			ring = i;
+		}
+	}
+
+	/* nothing to wait for ? */
+	if (ring == RADEON_NUM_RINGS) {
+		return 0;
+	}
+
+	while (!radeon_fence_any_seq_signaled(rdev, target_seq)) {
+		timeout = jiffies - RADEON_FENCE_JIFFIES_TIMEOUT;
+		if (time_after(last_activity, timeout)) {
+			/* the normal case, timeout is somewhere before last_activity */
+			timeout = last_activity - timeout;
+		} else {
+			/* either jiffies wrapped around, or no fence was signaled in the last 500ms
+			 * anyway we will just wait for the minimum amount and then check for a lockup
+			 */
+			timeout = 1;
+		}
+
+		trace_radeon_fence_wait_begin(rdev->ddev, target_seq[ring]);
+		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+			if (target_seq[i]) {
+				radeon_irq_kms_sw_irq_get(rdev, i);
+			}
+		}
+		if (intr) {
+			r = wait_event_interruptible_timeout(rdev->fence_queue,
+				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq)),
+				timeout);
+		} else {
+			r = wait_event_timeout(rdev->fence_queue,
+				(signaled = radeon_fence_any_seq_signaled(rdev, target_seq)),
+				timeout);
+		}
+		for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+			if (target_seq[i]) {
+				radeon_irq_kms_sw_irq_put(rdev, i);
+			}
+		}
+		if (unlikely(r < 0)) {
+			return r;
+		}
+		trace_radeon_fence_wait_end(rdev->ddev, target_seq[ring]);
+
+		if (unlikely(!signaled)) {
+			/* we were interrupted for some reason and fence
+			 * isn't signaled yet, resume waiting */
+			if (r) {
+				continue;
+			}
+
+			mutex_lock(&rdev->ring_lock);
+			for (i = 0, tmp = 0; i < RADEON_NUM_RINGS; ++i) {
+				if (time_after(rdev->fence_drv[i].last_activity, tmp)) {
+					tmp = rdev->fence_drv[i].last_activity;
+				}
+			}
+			/* test if somebody else has already decided that this is a lockup */
+			if (last_activity != tmp) {
+				last_activity = tmp;
+				mutex_unlock(&rdev->ring_lock);
+				continue;
+			}
+
+			if (radeon_ring_is_lockup(rdev, ring, &rdev->ring[ring])) {
+				/* good news we believe it's a lockup */
+				dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx)\n",
+					 target_seq[ring]);
+
+				/* change last activity so nobody else think there is a lockup */
+				for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+					rdev->fence_drv[i].last_activity = jiffies;
+				}
+
+				/* mark the ring as not ready any more */
+				rdev->ring[ring].ready = false;
+				mutex_unlock(&rdev->ring_lock);
+				return -EDEADLK;
+			}
+			mutex_unlock(&rdev->ring_lock);
+		}
+	}
+	return 0;
+}
+
+int radeon_fence_wait_any(struct radeon_device *rdev,
+			  struct radeon_fence **fences,
+			  bool intr)
+{
+	uint64_t seq[RADEON_NUM_RINGS];
+	unsigned i;
+	int r;
+
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		seq[i] = 0;
+
+		if (!fences[i]) {
+			continue;
+		}
+
+		if (fences[i]->seq == RADEON_FENCE_SIGNALED_SEQ) {
+			/* something was allready signaled */
+			return 0;
+		}
+
+		if (fences[i]->seq < RADEON_FENCE_NOTEMITED_SEQ) {
+			seq[i] = fences[i]->seq;
+		}
+	}
+
+	r = radeon_fence_wait_any_seq(rdev, seq, intr);
+	if (r) {
+		return r;
+	}
+	return 0;
+}
+
 int radeon_fence_wait_next_locked(struct radeon_device *rdev, int ring)
 {
 	uint64_t seq;
@@ -350,7 +503,7 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 
 	wake = radeon_fence_poll(rdev, ring);
 	if (wake) {
-		wake_up_all(&rdev->fence_drv[ring].queue);
+		wake_up_all(&rdev->fence_queue);
 	}
 }
 
@@ -406,7 +559,6 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
 	rdev->fence_drv[ring].seq = 0;
 	atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
 	rdev->fence_drv[ring].last_activity = jiffies;
-	init_waitqueue_head(&rdev->fence_drv[ring].queue);
 	rdev->fence_drv[ring].initialized = false;
 }
 
@@ -414,6 +566,7 @@ int radeon_fence_driver_init(struct radeon_device *rdev)
 {
 	int ring;
 
+	init_waitqueue_head(&rdev->fence_queue);
 	for (ring = 0; ring < RADEON_NUM_RINGS; ring++) {
 		radeon_fence_driver_init_ring(rdev, ring);
 	}
@@ -432,7 +585,7 @@ void radeon_fence_driver_fini(struct radeon_device *rdev)
 		if (!rdev->fence_drv[ring].initialized)
 			continue;
 		radeon_fence_wait_empty_locked(rdev, ring);
-		wake_up_all(&rdev->fence_drv[ring].queue);
+		wake_up_all(&rdev->fence_queue);
 		radeon_scratch_free(rdev, rdev->fence_drv[ring].scratch_reg);
 		rdev->fence_drv[ring].initialized = false;
 	}
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (12 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 13/20] drm/radeon: use one wait queue for all rings add fence_wait_any v2 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 15:23   ` Jerome Glisse
  2012-05-07 11:42 ` [PATCH 15/20] drm/radeon: simplify semaphore handling v2 Christian König
                   ` (6 subsequent siblings)
  20 siblings, 1 reply; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König, Jerome Glisse

A startover with a new idea for a multiple ring allocator.
Should perform as well as a normal ring allocator as long
as only one ring does somthing, but falls back to a more
complex algorithm if more complex things start to happen.

We store the last allocated bo in last, we always try to allocate
after the last allocated bo. Principle is that in a linear GPU ring
progression was is after last is the oldest bo we allocated and thus
the first one that should no longer be in use by the GPU.

If it's not the case we skip over the bo after last to the closest
done bo if such one exist. If none exist and we are not asked to
block we report failure to allocate.

If we are asked to block we wait on all the oldest fence of all
rings. We just wait for any of those fence to complete.

v2: We need to be able to let hole point to the list_head, otherwise
    try free will never free the first allocation of the list. Also
    stop calling radeon_fence_signalled more than necessary.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon.h      |    7 +-
 drivers/gpu/drm/radeon/radeon_ring.c |   19 +--
 drivers/gpu/drm/radeon/radeon_sa.c   |  292 +++++++++++++++++++++++-----------
 3 files changed, 210 insertions(+), 108 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 37a7459..cc7f16a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -385,7 +385,9 @@ struct radeon_bo_list {
 struct radeon_sa_manager {
 	spinlock_t		lock;
 	struct radeon_bo	*bo;
-	struct list_head	sa_bo;
+	struct list_head	*hole;
+	struct list_head	flist[RADEON_NUM_RINGS];
+	struct list_head	olist;
 	unsigned		size;
 	uint64_t		gpu_addr;
 	void			*cpu_ptr;
@@ -396,7 +398,8 @@ struct radeon_sa_bo;
 
 /* sub-allocation buffer */
 struct radeon_sa_bo {
-	struct list_head		list;
+	struct list_head		olist;
+	struct list_head		flist;
 	struct radeon_sa_manager	*manager;
 	unsigned			soffset;
 	unsigned			eoffset;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index 1748d93..e074ff5 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib)
 
 int radeon_ib_pool_init(struct radeon_device *rdev)
 {
-	struct radeon_sa_manager tmp;
 	int i, r;
 
-	r = radeon_sa_bo_manager_init(rdev, &tmp,
-				      RADEON_IB_POOL_SIZE*64*1024,
-				      RADEON_GEM_DOMAIN_GTT);
-	if (r) {
-		return r;
-	}
-
 	radeon_mutex_lock(&rdev->ib_pool.mutex);
 	if (rdev->ib_pool.ready) {
 		radeon_mutex_unlock(&rdev->ib_pool.mutex);
-		radeon_sa_bo_manager_fini(rdev, &tmp);
 		return 0;
 	}
 
-	rdev->ib_pool.sa_manager = tmp;
-	INIT_LIST_HEAD(&rdev->ib_pool.sa_manager.sa_bo);
+	r = radeon_sa_bo_manager_init(rdev, &rdev->ib_pool.sa_manager,
+				      RADEON_IB_POOL_SIZE*64*1024,
+				      RADEON_GEM_DOMAIN_GTT);
+	if (r) {
+		radeon_mutex_unlock(&rdev->ib_pool.mutex);
+		return r;
+	}
+
 	for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
 		rdev->ib_pool.ibs[i].fence = NULL;
 		rdev->ib_pool.ibs[i].idx = i;
diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c
index 90ee8ad..757a9d4 100644
--- a/drivers/gpu/drm/radeon/radeon_sa.c
+++ b/drivers/gpu/drm/radeon/radeon_sa.c
@@ -27,21 +27,42 @@
  * Authors:
  *    Jerome Glisse <glisse@freedesktop.org>
  */
+/* Algorithm:
+ *
+ * We store the last allocated bo in "hole", we always try to allocate
+ * after the last allocated bo. Principle is that in a linear GPU ring
+ * progression was is after last is the oldest bo we allocated and thus
+ * the first one that should no longer be in use by the GPU.
+ *
+ * If it's not the case we skip over the bo after last to the closest
+ * done bo if such one exist. If none exist and we are not asked to
+ * block we report failure to allocate.
+ *
+ * If we are asked to block we wait on all the oldest fence of all
+ * rings. We just wait for any of those fence to complete.
+ */
 #include "drmP.h"
 #include "drm.h"
 #include "radeon.h"
 
+static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo);
+static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager);
+
 int radeon_sa_bo_manager_init(struct radeon_device *rdev,
 			      struct radeon_sa_manager *sa_manager,
 			      unsigned size, u32 domain)
 {
-	int r;
+	int i, r;
 
 	spin_lock_init(&sa_manager->lock);
 	sa_manager->bo = NULL;
 	sa_manager->size = size;
 	sa_manager->domain = domain;
-	INIT_LIST_HEAD(&sa_manager->sa_bo);
+	sa_manager->hole = &sa_manager->olist;
+	INIT_LIST_HEAD(&sa_manager->olist);
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		INIT_LIST_HEAD(&sa_manager->flist[i]);
+	}
 
 	r = radeon_bo_create(rdev, size, RADEON_GPU_PAGE_SIZE, true,
 			     RADEON_GEM_DOMAIN_CPU, &sa_manager->bo);
@@ -58,11 +79,15 @@ void radeon_sa_bo_manager_fini(struct radeon_device *rdev,
 {
 	struct radeon_sa_bo *sa_bo, *tmp;
 
-	if (!list_empty(&sa_manager->sa_bo)) {
-		dev_err(rdev->dev, "sa_manager is not empty, clearing anyway\n");
+	if (!list_empty(&sa_manager->olist)) {
+		sa_manager->hole = &sa_manager->olist,
+		radeon_sa_bo_try_free(sa_manager);
+		if (!list_empty(&sa_manager->olist)) {
+			dev_err(rdev->dev, "sa_manager is not empty, clearing anyway\n");
+		}
 	}
-	list_for_each_entry_safe(sa_bo, tmp, &sa_manager->sa_bo, list) {
-		list_del_init(&sa_bo->list);
+	list_for_each_entry_safe(sa_bo, tmp, &sa_manager->olist, olist) {
+		radeon_sa_bo_remove_locked(sa_bo);
 	}
 	radeon_bo_unref(&sa_manager->bo);
 	sa_manager->size = 0;
@@ -114,111 +139,181 @@ int radeon_sa_bo_manager_suspend(struct radeon_device *rdev,
 	return r;
 }
 
-/*
- * Principe is simple, we keep a list of sub allocation in offset
- * order (first entry has offset == 0, last entry has the highest
- * offset).
- *
- * When allocating new object we first check if there is room at
- * the end total_size - (last_object_offset + last_object_size) >=
- * alloc_size. If so we allocate new object there.
- *
- * When there is not enough room at the end, we start waiting for
- * each sub object until we reach object_offset+object_size >=
- * alloc_size, this object then become the sub object we return.
- *
- * Alignment can't be bigger than page size
- */
-
 static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo)
 {
-	list_del(&sa_bo->list);
+	struct radeon_sa_manager *sa_manager = sa_bo->manager;
+	if (sa_manager->hole == &sa_bo->olist) {
+		sa_manager->hole = sa_bo->olist.prev;
+	}
+	list_del_init(&sa_bo->olist);
+	list_del_init(&sa_bo->flist);
 	radeon_fence_unref(&sa_bo->fence);
 	kfree(sa_bo);
 }
 
+static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager)
+{
+	struct radeon_sa_bo *sa_bo, *tmp;
+
+	if (sa_manager->hole->next == &sa_manager->olist)
+		return;
+
+	sa_bo = list_entry(sa_manager->hole->next, struct radeon_sa_bo, olist);
+	list_for_each_entry_safe_from(sa_bo, tmp, &sa_manager->olist, olist) {
+		if (sa_bo->fence == NULL || !radeon_fence_signaled(sa_bo->fence)) {
+			return;
+		}
+		radeon_sa_bo_remove_locked(sa_bo);
+	}
+}
+
+static inline unsigned radeon_sa_bo_hole_soffset(struct radeon_sa_manager *sa_manager)
+{
+	struct list_head *hole = sa_manager->hole;
+
+	if (hole != &sa_manager->olist) {
+		return list_entry(hole, struct radeon_sa_bo, olist)->eoffset;
+	}
+	return 0;
+}
+
+static inline unsigned radeon_sa_bo_hole_eoffset(struct radeon_sa_manager *sa_manager)
+{
+	struct list_head *hole = sa_manager->hole;
+
+	if (hole->next != &sa_manager->olist) {
+		return list_entry(hole->next, struct radeon_sa_bo, olist)->soffset;
+	}
+	return sa_manager->size;
+}
+
+static bool radeon_sa_bo_try_alloc(struct radeon_sa_manager *sa_manager,
+				   struct radeon_sa_bo *sa_bo,
+				   unsigned size, unsigned align)
+{
+	unsigned soffset, eoffset, wasted;
+
+	soffset = radeon_sa_bo_hole_soffset(sa_manager);
+	eoffset = radeon_sa_bo_hole_eoffset(sa_manager);
+	wasted = (align - (soffset % align)) % align;
+
+	if ((eoffset - soffset) >= (size + wasted)) {
+		soffset += wasted;
+
+		sa_bo->manager = sa_manager;
+		sa_bo->soffset = soffset;
+		sa_bo->eoffset = soffset + size;
+		list_add(&sa_bo->olist, sa_manager->hole);
+		INIT_LIST_HEAD(&sa_bo->flist);
+		sa_manager->hole = &sa_bo->olist;
+		return true;
+	}
+	return false;
+}
+
+static bool radeon_sa_bo_next_hole(struct radeon_sa_manager *sa_manager,
+				   struct radeon_fence **fences)
+{
+	unsigned i, soffset, best, tmp;
+
+	/* if hole points to the end of the buffer */
+	if (sa_manager->hole->next == &sa_manager->olist) {
+		/* try again with its beginning */
+		sa_manager->hole = &sa_manager->olist;
+		return true;
+	}
+
+	soffset = radeon_sa_bo_hole_soffset(sa_manager);
+	/* to handle wrap around we add sa_manager->size */
+	best = sa_manager->size * 2;
+	/* go over all fence list and try to find the closest sa_bo
+	 * of the current last
+	 */
+	for (i = 0; i < RADEON_NUM_RINGS; ++i) {
+		struct radeon_sa_bo *sa_bo;
+
+		if (list_empty(&sa_manager->flist[i])) {
+			fences[i] = NULL;
+			continue;
+		}
+
+		sa_bo = list_first_entry(&sa_manager->flist[i],
+					 struct radeon_sa_bo, flist);
+
+		if (!radeon_fence_signaled(sa_bo->fence)) {
+			fences[i] = sa_bo->fence;
+			continue;
+		}
+
+		tmp = sa_bo->soffset;
+		if (tmp < soffset) {
+			/* wrap around, pretend it's after */
+			tmp += sa_manager->size;
+		}
+		tmp -= soffset;
+		if (tmp < best) {
+			/* this sa bo is the closest one */
+			best = tmp;
+			sa_manager->hole = sa_bo->olist.prev;
+		}
+
+		/* we knew that this one is signaled,
+		   so it's save to remote it */
+		radeon_sa_bo_remove_locked(sa_bo);
+	}
+	return best != sa_manager->size * 2;
+}
+
 int radeon_sa_bo_new(struct radeon_device *rdev,
 		     struct radeon_sa_manager *sa_manager,
 		     struct radeon_sa_bo **sa_bo,
 		     unsigned size, unsigned align, bool block)
 {
-	struct radeon_fence *fence = NULL;
-	struct radeon_sa_bo *tmp, *next;
-	struct list_head *head;
-	unsigned offset = 0, wasted = 0;
-	int r;
+	struct radeon_fence *fences[RADEON_NUM_RINGS];
+	int r = -ENOMEM;
 
 	BUG_ON(align > RADEON_GPU_PAGE_SIZE);
 	BUG_ON(size > sa_manager->size);
 
 	*sa_bo = kmalloc(sizeof(struct radeon_sa_bo), GFP_KERNEL);
-
-retry:
+	if ((*sa_bo) == NULL) {
+		return -ENOMEM;
+	}
+	(*sa_bo)->manager = sa_manager;
+	(*sa_bo)->fence = NULL;
+	INIT_LIST_HEAD(&(*sa_bo)->olist);
+	INIT_LIST_HEAD(&(*sa_bo)->flist);
 
 	spin_lock(&sa_manager->lock);
+	do {
+		/* try to allocate couple time before going to wait */
+		do {
+			radeon_sa_bo_try_free(sa_manager);
 
-	/* no one ? */
-	head = sa_manager->sa_bo.prev;
-	if (list_empty(&sa_manager->sa_bo)) {
-		goto out;
-	}
-
-	/* look for a hole big enough */
-	offset = 0;
-	list_for_each_entry_safe(tmp, next, &sa_manager->sa_bo, list) {
-		/* try to free this object */
-		if (tmp->fence) {
-			if (radeon_fence_signaled(tmp->fence)) {
-				radeon_sa_bo_remove_locked(tmp);
-				continue;
-			} else {
-				fence = tmp->fence;
+			if (radeon_sa_bo_try_alloc(sa_manager, *sa_bo,
+						   size, align)) {
+				spin_unlock(&sa_manager->lock);
+				return 0;
 			}
-		}
 
-		/* room before this object ? */
-		if (offset < tmp->soffset && (tmp->soffset - offset) >= size) {
-			head = tmp->list.prev;
-			goto out;
-		}
-		offset = tmp->eoffset;
-		wasted = offset % align;
-		if (wasted) {
-			wasted = align - wasted;
-		}
-		offset += wasted;
-	}
-	/* room at the end ? */
-	head = sa_manager->sa_bo.prev;
-	tmp = list_entry(head, struct radeon_sa_bo, list);
-	offset = tmp->eoffset;
-	wasted = offset % align;
-	if (wasted) {
-		wasted = align - wasted;
-	}
-	offset += wasted;
-	if ((sa_manager->size - offset) < size) {
-		/* failed to find somethings big enough */
-		spin_unlock(&sa_manager->lock);
-		if (block && fence) {
-			r = radeon_fence_wait(fence, false);
-			if (r)
-				return r;
-
-			goto retry;
+			/* see if we can skip over some allocations */
+		} while (radeon_sa_bo_next_hole(sa_manager, fences));
+
+		if (block) {
+			spin_unlock(&sa_manager->lock);
+			r = radeon_fence_wait_any(rdev, fences, false);
+			spin_lock(&sa_manager->lock);
+			if (r) {
+				goto out_err;
+			}
 		}
-		kfree(*sa_bo);
-		*sa_bo = NULL;
-		return -ENOMEM;
-	}
+	} while (block);
 
-out:
-	(*sa_bo)->manager = sa_manager;
-	(*sa_bo)->soffset = offset;
-	(*sa_bo)->eoffset = offset + size;
-	list_add(&(*sa_bo)->list, head);
+out_err:
 	spin_unlock(&sa_manager->lock);
-	return 0;
+	kfree(*sa_bo);
+	*sa_bo = NULL;
+	return r;
 }
 
 void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo,
@@ -226,13 +321,16 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo,
 {
 	struct radeon_sa_manager *sa_manager;
 
-	if (!sa_bo || !*sa_bo)
+	if (sa_bo == NULL || *sa_bo == NULL) {
 		return;
+	}
 
 	sa_manager = (*sa_bo)->manager;
 	spin_lock(&sa_manager->lock);
 	if (fence && fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
 		(*sa_bo)->fence = radeon_fence_ref(fence);
+		list_add_tail(&(*sa_bo)->flist,
+			      &sa_manager->flist[fence->ring]);
 	} else {
 		radeon_sa_bo_remove_locked(*sa_bo);
 	}
@@ -247,15 +345,19 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
 	struct radeon_sa_bo *i;
 
 	spin_lock(&sa_manager->lock);
-	list_for_each_entry(i, &sa_manager->sa_bo, list) {
-		seq_printf(m, "[%08x %08x] size %4d (%p)",
-			   i->soffset, i->eoffset, i->eoffset - i->soffset, i);
-		if (i->fence) {
-			seq_printf(m, " protected by %Ld (%p) on ring %d\n",
-				   i->fence->seq, i->fence, i->fence->ring);
+	list_for_each_entry(i, &sa_manager->olist, olist) {
+		if (&i->olist == sa_manager->hole) {
+			seq_printf(m, ">");
 		} else {
-			seq_printf(m, "\n");
+			seq_printf(m, " ");
+		}
+		seq_printf(m, "[0x%08x 0x%08x] size %8d",
+			   i->soffset, i->eoffset, i->eoffset - i->soffset);
+		if (i->fence) {
+			seq_printf(m, " protected by 0x%016llx on ring %d",
+				   i->fence->seq, i->fence->ring);
 		}
+		seq_printf(m, "\n");
 	}
 	spin_unlock(&sa_manager->lock);
 }
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 15/20] drm/radeon: simplify semaphore handling v2
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (13 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 14/20] drm/radeon: multiple ring allocator v2 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 16/20] drm/radeon: rip out the ib pool Christian König
                   ` (5 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse, Christian König

From: Jerome Glisse <jglisse@redhat.com>

Directly use the suballocator to get small chunks of memory.
It's equally fast and doesn't crash when we encounter a GPU reset.

v2: rebased on new SA interface.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/evergreen.c        |    1 -
 drivers/gpu/drm/radeon/ni.c               |    1 -
 drivers/gpu/drm/radeon/r600.c             |    1 -
 drivers/gpu/drm/radeon/radeon.h           |   29 +-----
 drivers/gpu/drm/radeon/radeon_device.c    |    2 -
 drivers/gpu/drm/radeon/radeon_fence.c     |    2 +-
 drivers/gpu/drm/radeon/radeon_semaphore.c |  137 +++++------------------------
 drivers/gpu/drm/radeon/radeon_test.c      |    4 +-
 drivers/gpu/drm/radeon/rv770.c            |    1 -
 drivers/gpu/drm/radeon/si.c               |    1 -
 10 files changed, 30 insertions(+), 149 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen.c b/drivers/gpu/drm/radeon/evergreen.c
index ecc29bc..7e7ac3d 100644
--- a/drivers/gpu/drm/radeon/evergreen.c
+++ b/drivers/gpu/drm/radeon/evergreen.c
@@ -3550,7 +3550,6 @@ void evergreen_fini(struct radeon_device *rdev)
 	evergreen_pcie_gart_fini(rdev);
 	r600_vram_scratch_fini(rdev);
 	radeon_gem_fini(rdev);
-	radeon_semaphore_driver_fini(rdev);
 	radeon_fence_driver_fini(rdev);
 	radeon_agp_fini(rdev);
 	radeon_bo_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/ni.c b/drivers/gpu/drm/radeon/ni.c
index 9cd2657..107b217 100644
--- a/drivers/gpu/drm/radeon/ni.c
+++ b/drivers/gpu/drm/radeon/ni.c
@@ -1744,7 +1744,6 @@ void cayman_fini(struct radeon_device *rdev)
 	cayman_pcie_gart_fini(rdev);
 	r600_vram_scratch_fini(rdev);
 	radeon_gem_fini(rdev);
-	radeon_semaphore_driver_fini(rdev);
 	radeon_fence_driver_fini(rdev);
 	radeon_bo_fini(rdev);
 	radeon_atombios_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 87a2333..0ae2d2d 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2658,7 +2658,6 @@ void r600_fini(struct radeon_device *rdev)
 	r600_vram_scratch_fini(rdev);
 	radeon_agp_fini(rdev);
 	radeon_gem_fini(rdev);
-	radeon_semaphore_driver_fini(rdev);
 	radeon_fence_driver_fini(rdev);
 	radeon_bo_fini(rdev);
 	radeon_atombios_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index cc7f16a..45164e1 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -434,34 +434,13 @@ int radeon_mode_dumb_destroy(struct drm_file *file_priv,
 /*
  * Semaphores.
  */
-struct radeon_ring;
-
-#define	RADEON_SEMAPHORE_BO_SIZE	256
-
-struct radeon_semaphore_driver {
-	rwlock_t			lock;
-	struct list_head		bo;
-};
-
-struct radeon_semaphore_bo;
-
 /* everything here is constant */
 struct radeon_semaphore {
-	struct list_head		list;
+	struct radeon_sa_bo		*sa_bo;
+	signed				waiters;
 	uint64_t			gpu_addr;
-	uint32_t			*cpu_ptr;
-	struct radeon_semaphore_bo	*bo;
 };
 
-struct radeon_semaphore_bo {
-	struct list_head		list;
-	struct radeon_ib		*ib;
-	struct list_head		free;
-	struct radeon_semaphore		semaphores[RADEON_SEMAPHORE_BO_SIZE/8];
-	unsigned			nused;
-};
-
-void radeon_semaphore_driver_fini(struct radeon_device *rdev);
 int radeon_semaphore_create(struct radeon_device *rdev,
 			    struct radeon_semaphore **semaphore);
 void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
@@ -473,7 +452,8 @@ int radeon_semaphore_sync_rings(struct radeon_device *rdev,
 				bool sync_to[RADEON_NUM_RINGS],
 				int dst_ring);
 void radeon_semaphore_free(struct radeon_device *rdev,
-			   struct radeon_semaphore *semaphore);
+			   struct radeon_semaphore *semaphore,
+			   struct radeon_fence *fence);
 
 /*
  * GART structures, functions & helpers
@@ -1540,7 +1520,6 @@ struct radeon_device {
 	struct radeon_mman		mman;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
 	wait_queue_head_t		fence_queue;
-	struct radeon_semaphore_driver	semaphore_drv;
 	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	struct radeon_ib_pool		ib_pool;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index b827b2e..48876c1 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -732,11 +732,9 @@ int radeon_device_init(struct radeon_device *rdev,
 	mutex_init(&rdev->gem.mutex);
 	mutex_init(&rdev->pm.mutex);
 	mutex_init(&rdev->vram_mutex);
-	rwlock_init(&rdev->semaphore_drv.lock);
 	INIT_LIST_HEAD(&rdev->gem.objects);
 	init_waitqueue_head(&rdev->irq.vblank_queue);
 	init_waitqueue_head(&rdev->irq.idle_queue);
-	INIT_LIST_HEAD(&rdev->semaphore_drv.bo);
 	/* initialize vm here */
 	rdev->vm_manager.use_bitmap = 1;
 	rdev->vm_manager.max_pfn = 1 << 20;
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 45d4e6e..6767381 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -138,7 +138,7 @@ static void radeon_fence_destroy(struct kref *kref)
 	fence = container_of(kref, struct radeon_fence, kref);
 	fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
 	if (fence->semaphore)
-		radeon_semaphore_free(fence->rdev, fence->semaphore);
+		radeon_semaphore_free(fence->rdev, fence->semaphore, NULL);
 	kfree(fence);
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c
index dbde874..1bc5513 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -31,118 +31,40 @@
 #include "drm.h"
 #include "radeon.h"
 
-static int radeon_semaphore_add_bo(struct radeon_device *rdev)
-{
-	struct radeon_semaphore_bo *bo;
-	unsigned long irq_flags;
-	uint64_t gpu_addr;
-	uint32_t *cpu_ptr;
-	int r, i;
-
-	bo = kmalloc(sizeof(struct radeon_semaphore_bo), GFP_KERNEL);
-	if (bo == NULL) {
-		return -ENOMEM;
-	}
-	INIT_LIST_HEAD(&bo->free);
-	INIT_LIST_HEAD(&bo->list);
-	bo->nused = 0;
-
-	r = radeon_ib_get(rdev, 0, &bo->ib, RADEON_SEMAPHORE_BO_SIZE);
-	if (r) {
-		dev_err(rdev->dev, "failed to get a bo after 5 retry\n");
-		kfree(bo);
-		return r;
-	}
-	gpu_addr = radeon_sa_bo_gpu_addr(bo->ib->sa_bo);
-	cpu_ptr = radeon_sa_bo_cpu_addr(bo->ib->sa_bo);
-	for (i = 0; i < (RADEON_SEMAPHORE_BO_SIZE/8); i++) {
-		bo->semaphores[i].gpu_addr = gpu_addr;
-		bo->semaphores[i].cpu_ptr = cpu_ptr;
-		bo->semaphores[i].bo = bo;
-		list_add_tail(&bo->semaphores[i].list, &bo->free);
-		gpu_addr += 8;
-		cpu_ptr += 2;
-	}
-	write_lock_irqsave(&rdev->semaphore_drv.lock, irq_flags);
-	list_add_tail(&bo->list, &rdev->semaphore_drv.bo);
-	write_unlock_irqrestore(&rdev->semaphore_drv.lock, irq_flags);
-	return 0;
-}
-
-static void radeon_semaphore_del_bo_locked(struct radeon_device *rdev,
-					   struct radeon_semaphore_bo *bo)
-{
-	radeon_sa_bo_free(rdev, &bo->ib->sa_bo, NULL);
-	radeon_fence_unref(&bo->ib->fence);
-	list_del(&bo->list);
-	kfree(bo);
-}
-
-void radeon_semaphore_shrink_locked(struct radeon_device *rdev)
-{
-	struct radeon_semaphore_bo *bo, *n;
-
-	if (list_empty(&rdev->semaphore_drv.bo)) {
-		return;
-	}
-	/* only shrink if first bo has free semaphore */
-	bo = list_first_entry(&rdev->semaphore_drv.bo, struct radeon_semaphore_bo, list);
-	if (list_empty(&bo->free)) {
-		return;
-	}
-	list_for_each_entry_safe_continue(bo, n, &rdev->semaphore_drv.bo, list) {
-		if (bo->nused)
-			continue;
-		radeon_semaphore_del_bo_locked(rdev, bo);
-	}
-}
 
 int radeon_semaphore_create(struct radeon_device *rdev,
 			    struct radeon_semaphore **semaphore)
 {
-	struct radeon_semaphore_bo *bo;
-	unsigned long irq_flags;
-	bool do_retry = true;
 	int r;
 
-retry:
-	*semaphore = NULL;
-	write_lock_irqsave(&rdev->semaphore_drv.lock, irq_flags);
-	list_for_each_entry(bo, &rdev->semaphore_drv.bo, list) {
-		if (list_empty(&bo->free))
-			continue;
-		*semaphore = list_first_entry(&bo->free, struct radeon_semaphore, list);
-		(*semaphore)->cpu_ptr[0] = 0;
-		(*semaphore)->cpu_ptr[1] = 0;
-		list_del(&(*semaphore)->list);
-		bo->nused++;
-		break;
-	}
-	write_unlock_irqrestore(&rdev->semaphore_drv.lock, irq_flags);
-
+	*semaphore = kmalloc(sizeof(struct radeon_semaphore), GFP_KERNEL);
 	if (*semaphore == NULL) {
-		if (do_retry) {
-			do_retry = false;
-			r = radeon_semaphore_add_bo(rdev);
-			if (r)
-				return r;
-			goto retry;
-		}
 		return -ENOMEM;
 	}
-
+	r = radeon_sa_bo_new(rdev, &rdev->ib_pool.sa_manager,
+			     &(*semaphore)->sa_bo, 8, 8, true);
+	if (r) {
+		kfree(*semaphore);
+		*semaphore = NULL;
+		return r;
+	}
+	(*semaphore)->waiters = 0;
+	(*semaphore)->gpu_addr = radeon_sa_bo_gpu_addr((*semaphore)->sa_bo);
+	*((uint64_t*)radeon_sa_bo_cpu_addr((*semaphore)->sa_bo)) = 0;
 	return 0;
 }
 
 void radeon_semaphore_emit_signal(struct radeon_device *rdev, int ring,
 			          struct radeon_semaphore *semaphore)
 {
+	--semaphore->waiters;
 	radeon_semaphore_ring_emit(rdev, ring, &rdev->ring[ring], semaphore, false);
 }
 
 void radeon_semaphore_emit_wait(struct radeon_device *rdev, int ring,
 			        struct radeon_semaphore *semaphore)
 {
+	++semaphore->waiters;
 	radeon_semaphore_ring_emit(rdev, ring, &rdev->ring[ring], semaphore, true);
 }
 
@@ -200,29 +122,16 @@ error:
 }
 
 void radeon_semaphore_free(struct radeon_device *rdev,
-			   struct radeon_semaphore *semaphore)
+			   struct radeon_semaphore *semaphore,
+			   struct radeon_fence *fence)
 {
-	unsigned long irq_flags;
-
-	write_lock_irqsave(&rdev->semaphore_drv.lock, irq_flags);
-	semaphore->bo->nused--;
-	list_add_tail(&semaphore->list, &semaphore->bo->free);
-	radeon_semaphore_shrink_locked(rdev);
-	write_unlock_irqrestore(&rdev->semaphore_drv.lock, irq_flags);
-}
-
-void radeon_semaphore_driver_fini(struct radeon_device *rdev)
-{
-	struct radeon_semaphore_bo *bo, *n;
-	unsigned long irq_flags;
-
-	write_lock_irqsave(&rdev->semaphore_drv.lock, irq_flags);
-	/* we force to free everything */
-	list_for_each_entry_safe(bo, n, &rdev->semaphore_drv.bo, list) {
-		if (!list_empty(&bo->free)) {
-			dev_err(rdev->dev, "still in use semaphore\n");
-		}
-		radeon_semaphore_del_bo_locked(rdev, bo);
+	if (semaphore == NULL) {
+		return;
+	}
+	if (semaphore->waiters > 0) {
+		dev_err(rdev->dev, "semaphore %p has more waiters than signalers,"
+			" hardware lockup imminent!\n", semaphore);
 	}
-	write_unlock_irqrestore(&rdev->semaphore_drv.lock, irq_flags);
+	radeon_sa_bo_free(rdev, &semaphore->sa_bo, fence);
+	kfree(semaphore);
 }
diff --git a/drivers/gpu/drm/radeon/radeon_test.c b/drivers/gpu/drm/radeon/radeon_test.c
index dc5dcf4..b057387 100644
--- a/drivers/gpu/drm/radeon/radeon_test.c
+++ b/drivers/gpu/drm/radeon/radeon_test.c
@@ -317,7 +317,7 @@ void radeon_test_ring_sync(struct radeon_device *rdev,
 
 out_cleanup:
 	if (semaphore)
-		radeon_semaphore_free(rdev, semaphore);
+		radeon_semaphore_free(rdev, semaphore, NULL);
 
 	if (fence1)
 		radeon_fence_unref(&fence1);
@@ -437,7 +437,7 @@ void radeon_test_ring_sync2(struct radeon_device *rdev,
 
 out_cleanup:
 	if (semaphore)
-		radeon_semaphore_free(rdev, semaphore);
+		radeon_semaphore_free(rdev, semaphore, NULL);
 
 	if (fenceA)
 		radeon_fence_unref(&fenceA);
diff --git a/drivers/gpu/drm/radeon/rv770.c b/drivers/gpu/drm/radeon/rv770.c
index cacec0e..c6ee54e 100644
--- a/drivers/gpu/drm/radeon/rv770.c
+++ b/drivers/gpu/drm/radeon/rv770.c
@@ -1278,7 +1278,6 @@ void rv770_fini(struct radeon_device *rdev)
 	rv770_pcie_gart_fini(rdev);
 	r600_vram_scratch_fini(rdev);
 	radeon_gem_fini(rdev);
-	radeon_semaphore_driver_fini(rdev);
 	radeon_fence_driver_fini(rdev);
 	radeon_agp_fini(rdev);
 	radeon_bo_fini(rdev);
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index 0bad5ff..d6b7fbc 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -4110,7 +4110,6 @@ void si_fini(struct radeon_device *rdev)
 	si_pcie_gart_fini(rdev);
 	r600_vram_scratch_fini(rdev);
 	radeon_gem_fini(rdev);
-	radeon_semaphore_driver_fini(rdev);
 	radeon_fence_driver_fini(rdev);
 	radeon_bo_fini(rdev);
 	radeon_atombios_fini(rdev);
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 16/20] drm/radeon: rip out the ib pool
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (14 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 15/20] drm/radeon: simplify semaphore handling v2 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 17/20] drm/radeon: immediately free ttm-move semaphore Christian König
                   ` (4 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse, Christian König

From: Jerome Glisse <jglisse@redhat.com>

It isn't necessary any more and the suballocator seems to perform
even better.

Signed-off-by: Christian König <deathsimple@vodafone.de>
Signed-off-by: Jerome Glisse <jglisse@redhat.com>
---
 drivers/gpu/drm/radeon/radeon.h           |   17 +--
 drivers/gpu/drm/radeon/radeon_device.c    |    1 -
 drivers/gpu/drm/radeon/radeon_gart.c      |   12 +-
 drivers/gpu/drm/radeon/radeon_ring.c      |  241 ++++++++---------------------
 drivers/gpu/drm/radeon/radeon_semaphore.c |    2 +-
 5 files changed, 71 insertions(+), 202 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 45164e1..6170307 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -625,7 +625,6 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc);
 
 struct radeon_ib {
 	struct radeon_sa_bo	*sa_bo;
-	unsigned		idx;
 	uint32_t		length_dw;
 	uint64_t		gpu_addr;
 	uint32_t		*ptr;
@@ -634,18 +633,6 @@ struct radeon_ib {
 	bool			is_const_ib;
 };
 
-/*
- * locking -
- * mutex protects scheduled_ibs, ready, alloc_bm
- */
-struct radeon_ib_pool {
-	struct radeon_mutex		mutex;
-	struct radeon_sa_manager	sa_manager;
-	struct radeon_ib		ibs[RADEON_IB_POOL_SIZE];
-	bool				ready;
-	unsigned			head_id;
-};
-
 struct radeon_ring {
 	struct radeon_bo	*ring_obj;
 	volatile uint32_t	*ring;
@@ -787,7 +774,6 @@ struct si_rlc {
 int radeon_ib_get(struct radeon_device *rdev, int ring,
 		  struct radeon_ib **ib, unsigned size);
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
-bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
 void radeon_ib_pool_fini(struct radeon_device *rdev);
@@ -1522,7 +1508,8 @@ struct radeon_device {
 	wait_queue_head_t		fence_queue;
 	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
-	struct radeon_ib_pool		ib_pool;
+	bool				ib_pool_ready;
+	struct radeon_sa_manager	ring_tmp_bo;
 	struct radeon_irq		irq;
 	struct radeon_asic		*asic;
 	struct radeon_gem		gem;
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index 48876c1..e1bc7e9 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -724,7 +724,6 @@ int radeon_device_init(struct radeon_device *rdev,
 	/* mutex initialization are all done here so we
 	 * can recall function without having locking issues */
 	radeon_mutex_init(&rdev->cs_mutex);
-	radeon_mutex_init(&rdev->ib_pool.mutex);
 	mutex_init(&rdev->ring_lock);
 	mutex_init(&rdev->dc_hw_i2c_mutex);
 	if (rdev->family >= CHIP_R600)
diff --git a/drivers/gpu/drm/radeon/radeon_gart.c b/drivers/gpu/drm/radeon/radeon_gart.c
index 53dba8e..8e9ef34 100644
--- a/drivers/gpu/drm/radeon/radeon_gart.c
+++ b/drivers/gpu/drm/radeon/radeon_gart.c
@@ -432,8 +432,8 @@ retry_id:
 	rdev->vm_manager.use_bitmap |= 1 << id;
 	vm->id = id;
 	list_add_tail(&vm->list, &rdev->vm_manager.lru_vm);
-	return radeon_vm_bo_update_pte(rdev, vm, rdev->ib_pool.sa_manager.bo,
-				       &rdev->ib_pool.sa_manager.bo->tbo.mem);
+	return radeon_vm_bo_update_pte(rdev, vm, rdev->ring_tmp_bo.bo,
+				       &rdev->ring_tmp_bo.bo->tbo.mem);
 }
 
 /* object have to be reserved */
@@ -631,7 +631,7 @@ int radeon_vm_init(struct radeon_device *rdev, struct radeon_vm *vm)
 	/* map the ib pool buffer at 0 in virtual address space, set
 	 * read only
 	 */
-	r = radeon_vm_bo_add(rdev, vm, rdev->ib_pool.sa_manager.bo, 0,
+	r = radeon_vm_bo_add(rdev, vm, rdev->ring_tmp_bo.bo, 0,
 			     RADEON_VM_PAGE_READABLE | RADEON_VM_PAGE_SNOOPED);
 	return r;
 }
@@ -648,12 +648,12 @@ void radeon_vm_fini(struct radeon_device *rdev, struct radeon_vm *vm)
 	radeon_mutex_unlock(&rdev->cs_mutex);
 
 	/* remove all bo */
-	r = radeon_bo_reserve(rdev->ib_pool.sa_manager.bo, false);
+	r = radeon_bo_reserve(rdev->ring_tmp_bo.bo, false);
 	if (!r) {
-		bo_va = radeon_bo_va(rdev->ib_pool.sa_manager.bo, vm);
+		bo_va = radeon_bo_va(rdev->ring_tmp_bo.bo, vm);
 		list_del_init(&bo_va->bo_list);
 		list_del_init(&bo_va->vm_list);
-		radeon_bo_unreserve(rdev->ib_pool.sa_manager.bo);
+		radeon_bo_unreserve(rdev->ring_tmp_bo.bo);
 		kfree(bo_va);
 	}
 	if (!list_empty(&vm->va)) {
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index e074ff5..b3d6942 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -24,6 +24,7 @@
  * Authors: Dave Airlie
  *          Alex Deucher
  *          Jerome Glisse
+ *          Christian König
  */
 #include <linux/seq_file.h>
 #include <linux/slab.h>
@@ -33,8 +34,10 @@
 #include "radeon.h"
 #include "atom.h"
 
-int radeon_debugfs_ib_init(struct radeon_device *rdev);
-int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *ring);
+/*
+ * IB.
+ */
+int radeon_debugfs_sa_init(struct radeon_device *rdev);
 
 u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
 {
@@ -61,106 +64,37 @@ u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
 	return idx_value;
 }
 
-void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
-{
-#if DRM_DEBUG_CODE
-	if (ring->count_dw <= 0) {
-		DRM_ERROR("radeon: writting more dword to ring than expected !\n");
-	}
-#endif
-	ring->ring[ring->wptr++] = v;
-	ring->wptr &= ring->ptr_mask;
-	ring->count_dw--;
-	ring->ring_free_dw--;
-}
-
-/*
- * IB.
- */
-bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib)
-{
-	bool done = false;
-
-	/* only free ib which have been emited */
-	if (ib->fence && ib->fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
-		if (radeon_fence_signaled(ib->fence)) {
-			radeon_fence_unref(&ib->fence);
-			radeon_sa_bo_free(rdev, &ib->sa_bo, NULL);
-			done = true;
-		}
-	}
-	return done;
-}
-
 int radeon_ib_get(struct radeon_device *rdev, int ring,
 		  struct radeon_ib **ib, unsigned size)
 {
-	struct radeon_fence *fence;
-	unsigned cretry = 0;
-	int r = 0, i, idx;
-
-	*ib = NULL;
-	/* align size on 256 bytes */
-	size = ALIGN(size, 256);
-
-	r = radeon_fence_create(rdev, &fence, ring);
-	if (r) {
-		dev_err(rdev->dev, "failed to create fence for new IB\n");
-		return r;
-	}
+	int r;
 
-	radeon_mutex_lock(&rdev->ib_pool.mutex);
-	idx = rdev->ib_pool.head_id;
-retry:
-	if (cretry > 5) {
-		dev_err(rdev->dev, "failed to get an ib after 5 retry\n");
-		radeon_mutex_unlock(&rdev->ib_pool.mutex);
-		radeon_fence_unref(&fence);
+	*ib = kmalloc(sizeof(struct radeon_ib), GFP_KERNEL);
+	if (*ib == NULL) {
 		return -ENOMEM;
 	}
-	cretry++;
-	for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
-		radeon_ib_try_free(rdev, &rdev->ib_pool.ibs[idx]);
-		if (rdev->ib_pool.ibs[idx].fence == NULL) {
-			r = radeon_sa_bo_new(rdev, &rdev->ib_pool.sa_manager,
-					     &rdev->ib_pool.ibs[idx].sa_bo,
-					     size, 256, false);
-			if (!r) {
-				*ib = &rdev->ib_pool.ibs[idx];
-				(*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo);
-				(*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
-				(*ib)->fence = fence;
-				(*ib)->vm_id = 0;
-				(*ib)->is_const_ib = false;
-				/* ib are most likely to be allocated in a ring fashion
-				 * thus rdev->ib_pool.head_id should be the id of the
-				 * oldest ib
-				 */
-				rdev->ib_pool.head_id = (1 + idx);
-				rdev->ib_pool.head_id &= (RADEON_IB_POOL_SIZE - 1);
-				radeon_mutex_unlock(&rdev->ib_pool.mutex);
-				return 0;
-			}
-		}
-		idx = (idx + 1) & (RADEON_IB_POOL_SIZE - 1);
+	r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &(*ib)->sa_bo, size, 256, true);
+	if (r) {
+		dev_err(rdev->dev, "failed to get a new IB (%d)\n", r);
+		kfree(*ib);
+		*ib = NULL;
+		return r;
 	}
-	/* this should be rare event, ie all ib scheduled none signaled yet.
-	 */
-	for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
-		struct radeon_fence *fence = rdev->ib_pool.ibs[idx].fence;
-		if (fence && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
-			r = radeon_fence_wait(fence, false);
-			if (!r) {
-				goto retry;
-			}
-			/* an error happened */
-			break;
-		}
-		idx = (idx + 1) & (RADEON_IB_POOL_SIZE - 1);
+	r = radeon_fence_create(rdev, &(*ib)->fence, ring);
+	if (r) {
+		dev_err(rdev->dev, "failed to create fence for new IB (%d)\n", r);
+		radeon_sa_bo_free(rdev, &(*ib)->sa_bo, NULL);
+		kfree(*ib);
+		*ib = NULL;
+		return r;
 	}
-	radeon_mutex_unlock(&rdev->ib_pool.mutex);
-	radeon_fence_unref(&fence);
-	return r;
+
+	(*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo);
+	(*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
+	(*ib)->vm_id = 0;
+	(*ib)->is_const_ib = false;
+
+	return 0;
 }
 
 void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
@@ -171,12 +105,9 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
 	if (tmp == NULL) {
 		return;
 	}
-	radeon_mutex_lock(&rdev->ib_pool.mutex);
-	if (tmp->fence && tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
-		radeon_sa_bo_free(rdev, &tmp->sa_bo, NULL);
-		radeon_fence_unref(&tmp->fence);
-	}
-	radeon_mutex_unlock(&rdev->ib_pool.mutex);
+	radeon_sa_bo_free(rdev, &tmp->sa_bo, tmp->fence);
+	radeon_fence_unref(&tmp->fence);
+	kfree(tmp);
 }
 
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib)
@@ -186,14 +117,14 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib)
 
 	if (!ib->length_dw || !ring->ready) {
 		/* TODO: Nothings in the ib we should report. */
-		DRM_ERROR("radeon: couldn't schedule IB(%u).\n", ib->idx);
+		dev_err(rdev->dev, "couldn't schedule ib\n");
 		return -EINVAL;
 	}
 
 	/* 64 dwords should be enough for fence too */
 	r = radeon_ring_lock(rdev, ring, 64);
 	if (r) {
-		DRM_ERROR("radeon: scheduling IB failed (%d).\n", r);
+		dev_err(rdev->dev, "scheduling IB failed (%d).\n", r);
 		return r;
 	}
 	radeon_ring_ib_execute(rdev, ib->fence->ring, ib);
@@ -204,63 +135,40 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib)
 
 int radeon_ib_pool_init(struct radeon_device *rdev)
 {
-	int i, r;
+	int r;
 
-	radeon_mutex_lock(&rdev->ib_pool.mutex);
-	if (rdev->ib_pool.ready) {
-		radeon_mutex_unlock(&rdev->ib_pool.mutex);
+	if (rdev->ib_pool_ready) {
 		return 0;
 	}
-
-	r = radeon_sa_bo_manager_init(rdev, &rdev->ib_pool.sa_manager,
+	r = radeon_sa_bo_manager_init(rdev, &rdev->ring_tmp_bo,
 				      RADEON_IB_POOL_SIZE*64*1024,
 				      RADEON_GEM_DOMAIN_GTT);
 	if (r) {
-		radeon_mutex_unlock(&rdev->ib_pool.mutex);
 		return r;
 	}
-
-	for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
-		rdev->ib_pool.ibs[i].fence = NULL;
-		rdev->ib_pool.ibs[i].idx = i;
-		rdev->ib_pool.ibs[i].length_dw = 0;
-		rdev->ib_pool.ibs[i].sa_bo = NULL;
-	}
-	rdev->ib_pool.head_id = 0;
-	rdev->ib_pool.ready = true;
-	DRM_INFO("radeon: ib pool ready.\n");
-
-	if (radeon_debugfs_ib_init(rdev)) {
-		DRM_ERROR("Failed to register debugfs file for IB !\n");
+	rdev->ib_pool_ready = true;
+	if (radeon_debugfs_sa_init(rdev)) {
+		dev_err(rdev->dev, "failed to register debugfs file for SA\n");
 	}
-	radeon_mutex_unlock(&rdev->ib_pool.mutex);
 	return 0;
 }
 
 void radeon_ib_pool_fini(struct radeon_device *rdev)
 {
-	unsigned i;
-
-	radeon_mutex_lock(&rdev->ib_pool.mutex);
-	if (rdev->ib_pool.ready) {
-		for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
-			radeon_sa_bo_free(rdev, &rdev->ib_pool.ibs[i].sa_bo, NULL);
-			radeon_fence_unref(&rdev->ib_pool.ibs[i].fence);
-		}
-		radeon_sa_bo_manager_fini(rdev, &rdev->ib_pool.sa_manager);
-		rdev->ib_pool.ready = false;
+	if (rdev->ib_pool_ready) {
+		radeon_sa_bo_manager_fini(rdev, &rdev->ring_tmp_bo);
+		rdev->ib_pool_ready = false;
 	}
-	radeon_mutex_unlock(&rdev->ib_pool.mutex);
 }
 
 int radeon_ib_pool_start(struct radeon_device *rdev)
 {
-	return radeon_sa_bo_manager_start(rdev, &rdev->ib_pool.sa_manager);
+	return radeon_sa_bo_manager_start(rdev, &rdev->ring_tmp_bo);
 }
 
 int radeon_ib_pool_suspend(struct radeon_device *rdev)
 {
-	return radeon_sa_bo_manager_suspend(rdev, &rdev->ib_pool.sa_manager);
+	return radeon_sa_bo_manager_suspend(rdev, &rdev->ring_tmp_bo);
 }
 
 int radeon_ib_ring_tests(struct radeon_device *rdev)
@@ -296,6 +204,21 @@ int radeon_ib_ring_tests(struct radeon_device *rdev)
 /*
  * Ring.
  */
+int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *ring);
+
+void radeon_ring_write(struct radeon_ring *ring, uint32_t v)
+{
+#if DRM_DEBUG_CODE
+	if (ring->count_dw <= 0) {
+		DRM_ERROR("radeon: writting more dword to ring than expected !\n");
+	}
+#endif
+	ring->ring[ring->wptr++] = v;
+	ring->wptr &= ring->ptr_mask;
+	ring->count_dw--;
+	ring->ring_free_dw--;
+}
+
 int radeon_ring_index(struct radeon_device *rdev, struct radeon_ring *ring)
 {
 	/* r1xx-r5xx only has CP ring */
@@ -575,37 +498,13 @@ static struct drm_info_list radeon_debugfs_ring_info_list[] = {
 	{"radeon_ring_cp2", radeon_debugfs_ring_info, 0, &cayman_ring_type_cp2_index},
 };
 
-static int radeon_debugfs_ib_info(struct seq_file *m, void *data)
-{
-	struct drm_info_node *node = (struct drm_info_node *) m->private;
-	struct drm_device *dev = node->minor->dev;
-	struct radeon_device *rdev = dev->dev_private;
-	struct radeon_ib *ib = &rdev->ib_pool.ibs[*((unsigned*)node->info_ent->data)];
-	unsigned i;
-
-	if (ib == NULL) {
-		return 0;
-	}
-	seq_printf(m, "IB %04u\n", ib->idx);
-	seq_printf(m, "IB fence %p\n", ib->fence);
-	seq_printf(m, "IB size %05u dwords\n", ib->length_dw);
-	for (i = 0; i < ib->length_dw; i++) {
-		seq_printf(m, "[%05u]=0x%08X\n", i, ib->ptr[i]);
-	}
-	return 0;
-}
-
-static struct drm_info_list radeon_debugfs_ib_list[RADEON_IB_POOL_SIZE];
-static char radeon_debugfs_ib_names[RADEON_IB_POOL_SIZE][32];
-static unsigned radeon_debugfs_ib_idx[RADEON_IB_POOL_SIZE];
-
 static int radeon_debugfs_sa_info(struct seq_file *m, void *data)
 {
 	struct drm_info_node *node = (struct drm_info_node *) m->private;
 	struct drm_device *dev = node->minor->dev;
 	struct radeon_device *rdev = dev->dev_private;
 
-	radeon_sa_bo_dump_debug_info(&rdev->ib_pool.sa_manager, m);
+	radeon_sa_bo_dump_debug_info(&rdev->ring_tmp_bo, m);
 
 	return 0;
 
@@ -637,26 +536,10 @@ int radeon_debugfs_ring_init(struct radeon_device *rdev, struct radeon_ring *rin
 	return 0;
 }
 
-int radeon_debugfs_ib_init(struct radeon_device *rdev)
+int radeon_debugfs_sa_init(struct radeon_device *rdev)
 {
 #if defined(CONFIG_DEBUG_FS)
-	unsigned i;
-	int r;
-
-	r = radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1);
-	if (r)
-		return r;
-
-	for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
-		sprintf(radeon_debugfs_ib_names[i], "radeon_ib_%04u", i);
-		radeon_debugfs_ib_idx[i] = i;
-		radeon_debugfs_ib_list[i].name = radeon_debugfs_ib_names[i];
-		radeon_debugfs_ib_list[i].show = &radeon_debugfs_ib_info;
-		radeon_debugfs_ib_list[i].driver_features = 0;
-		radeon_debugfs_ib_list[i].data = &radeon_debugfs_ib_idx[i];
-	}
-	return radeon_debugfs_add_files(rdev, radeon_debugfs_ib_list,
-					RADEON_IB_POOL_SIZE);
+	return radeon_debugfs_add_files(rdev, radeon_debugfs_sa_list, 1);
 #else
 	return 0;
 #endif
diff --git a/drivers/gpu/drm/radeon/radeon_semaphore.c b/drivers/gpu/drm/radeon/radeon_semaphore.c
index 1bc5513..e2ace5d 100644
--- a/drivers/gpu/drm/radeon/radeon_semaphore.c
+++ b/drivers/gpu/drm/radeon/radeon_semaphore.c
@@ -41,7 +41,7 @@ int radeon_semaphore_create(struct radeon_device *rdev,
 	if (*semaphore == NULL) {
 		return -ENOMEM;
 	}
-	r = radeon_sa_bo_new(rdev, &rdev->ib_pool.sa_manager,
+	r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo,
 			     &(*semaphore)->sa_bo, 8, 8, true);
 	if (r) {
 		kfree(*semaphore);
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 17/20] drm/radeon: immediately free ttm-move semaphore
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (15 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 16/20] drm/radeon: rip out the ib pool Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 18/20] drm/radeon: move the semaphore from the fence into the ib Christian König
                   ` (3 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

We can now protected the semaphore ram by a
fence, so free it immediately.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon_ttm.c |    7 +++++--
 1 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon_ttm.c b/drivers/gpu/drm/radeon/radeon_ttm.c
index 5e3d54d..0f6aee8 100644
--- a/drivers/gpu/drm/radeon/radeon_ttm.c
+++ b/drivers/gpu/drm/radeon/radeon_ttm.c
@@ -223,6 +223,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
 	struct radeon_device *rdev;
 	uint64_t old_start, new_start;
 	struct radeon_fence *fence, *old_fence;
+	struct radeon_semaphore *sem = NULL;
 	int r;
 
 	rdev = radeon_get_rdev(bo->bdev);
@@ -272,15 +273,16 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
 		bool sync_to_ring[RADEON_NUM_RINGS] = { };
 		sync_to_ring[old_fence->ring] = true;
 
-		r = radeon_semaphore_create(rdev, &fence->semaphore);
+		r = radeon_semaphore_create(rdev, &sem);
 		if (r) {
 			radeon_fence_unref(&fence);
 			return r;
 		}
 
-		r = radeon_semaphore_sync_rings(rdev, fence->semaphore,
+		r = radeon_semaphore_sync_rings(rdev, sem,
 						sync_to_ring, fence->ring);
 		if (r) {
+			radeon_semaphore_free(rdev, sem, NULL);
 			radeon_fence_unref(&fence);
 			return r;
 		}
@@ -292,6 +294,7 @@ static int radeon_move_blit(struct ttm_buffer_object *bo,
 	/* FIXME: handle copy error */
 	r = ttm_bo_move_accel_cleanup(bo, (void *)fence, NULL,
 				      evict, no_wait_reserve, no_wait_gpu, new_mem);
+	radeon_semaphore_free(rdev, sem, fence);
 	radeon_fence_unref(&fence);
 	return r;
 }
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 18/20] drm/radeon: move the semaphore from the fence into the ib
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (16 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 17/20] drm/radeon: immediately free ttm-move semaphore Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 19/20] drm/radeon: remove r600 blit mutex v2 Christian König
                   ` (2 subsequent siblings)
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse, Christian König

From: Jerome Glisse <jglisse@redhat.com>

It never really belonged there in the first place.

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/radeon.h       |   16 ++++++++--------
 drivers/gpu/drm/radeon/radeon_cs.c    |    4 ++--
 drivers/gpu/drm/radeon/radeon_fence.c |    3 ---
 drivers/gpu/drm/radeon/radeon_ring.c  |    2 ++
 4 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 6170307..9507be0 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -272,7 +272,6 @@ struct radeon_fence {
 	uint64_t			seq;
 	/* RB, DMA, etc. */
 	unsigned			ring;
-	struct radeon_semaphore		*semaphore;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -624,13 +623,14 @@ void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc);
  */
 
 struct radeon_ib {
-	struct radeon_sa_bo	*sa_bo;
-	uint32_t		length_dw;
-	uint64_t		gpu_addr;
-	uint32_t		*ptr;
-	struct radeon_fence	*fence;
-	unsigned		vm_id;
-	bool			is_const_ib;
+	struct radeon_sa_bo		*sa_bo;
+	uint32_t			length_dw;
+	uint64_t			gpu_addr;
+	uint32_t			*ptr;
+	struct radeon_fence		*fence;
+	unsigned			vm_id;
+	bool				is_const_ib;
+	struct radeon_semaphore		*semaphore;
 };
 
 struct radeon_ring {
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index 5c065bf..dcfe2a0 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -138,12 +138,12 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 		return 0;
 	}
 
-	r = radeon_semaphore_create(p->rdev, &p->ib->fence->semaphore);
+	r = radeon_semaphore_create(p->rdev, &p->ib->semaphore);
 	if (r) {
 		return r;
 	}
 
-	return radeon_semaphore_sync_rings(p->rdev, p->ib->fence->semaphore,
+	return radeon_semaphore_sync_rings(p->rdev, p->ib->semaphore,
 					   sync_to_ring, p->ring);
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index 6767381..c1f5233 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -137,8 +137,6 @@ static void radeon_fence_destroy(struct kref *kref)
 
 	fence = container_of(kref, struct radeon_fence, kref);
 	fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
-	if (fence->semaphore)
-		radeon_semaphore_free(fence->rdev, fence->semaphore, NULL);
 	kfree(fence);
 }
 
@@ -154,7 +152,6 @@ int radeon_fence_create(struct radeon_device *rdev,
 	(*fence)->rdev = rdev;
 	(*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ;
 	(*fence)->ring = ring;
-	(*fence)->semaphore = NULL;
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index b3d6942..af8e1ee 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -93,6 +93,7 @@ int radeon_ib_get(struct radeon_device *rdev, int ring,
 	(*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
 	(*ib)->vm_id = 0;
 	(*ib)->is_const_ib = false;
+	(*ib)->semaphore = NULL;
 
 	return 0;
 }
@@ -105,6 +106,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
 	if (tmp == NULL) {
 		return;
 	}
+	radeon_semaphore_free(rdev, tmp->semaphore, tmp->fence);
 	radeon_sa_bo_free(rdev, &tmp->sa_bo, tmp->fence);
 	radeon_fence_unref(&tmp->fence);
 	kfree(tmp);
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 19/20] drm/radeon: remove r600 blit mutex v2
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (17 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 18/20] drm/radeon: move the semaphore from the fence into the ib Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 11:42 ` [PATCH 20/20] drm/radeon: make the ib an inline object Christian König
  2012-05-07 14:34 ` SA and other Patches Jerome Glisse
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Christian König

If we don't store local data into global variables
it isn't necessary to lock anything.

v2: rebased on new SA interface

Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/evergreen_blit_kms.c |    1 -
 drivers/gpu/drm/radeon/r600.c               |   13 +---
 drivers/gpu/drm/radeon/r600_blit_kms.c      |   99 +++++++++++----------------
 drivers/gpu/drm/radeon/radeon.h             |    3 -
 drivers/gpu/drm/radeon/radeon_asic.h        |    9 ++-
 5 files changed, 50 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_blit_kms.c b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
index 222acd2..30f0480 100644
--- a/drivers/gpu/drm/radeon/evergreen_blit_kms.c
+++ b/drivers/gpu/drm/radeon/evergreen_blit_kms.c
@@ -637,7 +637,6 @@ int evergreen_blit_init(struct radeon_device *rdev)
 	if (rdev->r600_blit.shader_obj)
 		goto done;
 
-	mutex_init(&rdev->r600_blit.mutex);
 	rdev->r600_blit.state_offset = 0;
 
 	if (rdev->family < CHIP_CAYMAN)
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 0ae2d2d..9d6009a 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2363,20 +2363,15 @@ int r600_copy_blit(struct radeon_device *rdev,
 		   unsigned num_gpu_pages,
 		   struct radeon_fence *fence)
 {
+	struct radeon_sa_bo *vb = NULL;
 	int r;
 
-	mutex_lock(&rdev->r600_blit.mutex);
-	rdev->r600_blit.vb_ib = NULL;
-	r = r600_blit_prepare_copy(rdev, num_gpu_pages);
+	r = r600_blit_prepare_copy(rdev, num_gpu_pages, &vb);
 	if (r) {
-		if (rdev->r600_blit.vb_ib)
-			radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
-		mutex_unlock(&rdev->r600_blit.mutex);
 		return r;
 	}
-	r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages);
-	r600_blit_done_copy(rdev, fence);
-	mutex_unlock(&rdev->r600_blit.mutex);
+	r600_kms_blit_copy(rdev, src_offset, dst_offset, num_gpu_pages, vb);
+	r600_blit_done_copy(rdev, fence, vb);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/radeon/r600_blit_kms.c b/drivers/gpu/drm/radeon/r600_blit_kms.c
index db38f58..ef20822 100644
--- a/drivers/gpu/drm/radeon/r600_blit_kms.c
+++ b/drivers/gpu/drm/radeon/r600_blit_kms.c
@@ -513,7 +513,6 @@ int r600_blit_init(struct radeon_device *rdev)
 	rdev->r600_blit.primitives.set_default_state = set_default_state;
 
 	rdev->r600_blit.ring_size_common = 40; /* shaders + def state */
-	rdev->r600_blit.ring_size_common += 16; /* fence emit for VB IB */
 	rdev->r600_blit.ring_size_common += 5; /* done copy */
 	rdev->r600_blit.ring_size_common += 16; /* fence emit for done copy */
 
@@ -528,7 +527,6 @@ int r600_blit_init(struct radeon_device *rdev)
 	if (rdev->r600_blit.shader_obj)
 		goto done;
 
-	mutex_init(&rdev->r600_blit.mutex);
 	rdev->r600_blit.state_offset = 0;
 
 	if (rdev->family >= CHIP_RV770)
@@ -621,27 +619,6 @@ void r600_blit_fini(struct radeon_device *rdev)
 	radeon_bo_unref(&rdev->r600_blit.shader_obj);
 }
 
-static int r600_vb_ib_get(struct radeon_device *rdev, unsigned size)
-{
-	int r;
-	r = radeon_ib_get(rdev, RADEON_RING_TYPE_GFX_INDEX,
-			  &rdev->r600_blit.vb_ib, size);
-	if (r) {
-		DRM_ERROR("failed to get IB for vertex buffer\n");
-		return r;
-	}
-
-	rdev->r600_blit.vb_total = size;
-	rdev->r600_blit.vb_used = 0;
-	return 0;
-}
-
-static void r600_vb_ib_put(struct radeon_device *rdev)
-{
-	radeon_fence_emit(rdev, rdev->r600_blit.vb_ib->fence);
-	radeon_ib_free(rdev, &rdev->r600_blit.vb_ib);
-}
-
 static unsigned r600_blit_create_rect(unsigned num_gpu_pages,
 				      int *width, int *height, int max_dim)
 {
@@ -688,7 +665,8 @@ static unsigned r600_blit_create_rect(unsigned num_gpu_pages,
 }
 
 
-int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages)
+int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages,
+			   struct radeon_sa_bo **vb)
 {
 	struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX];
 	int r;
@@ -705,46 +683,54 @@ int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages)
 	}
 
 	/* 48 bytes for vertex per loop */
-	r = r600_vb_ib_get(rdev, (num_loops*48)+256);
-	if (r)
+	r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, vb,
+			     (num_loops*48)+256, 256, true);
+	if (r) {
 		return r;
+	}
 
 	/* calculate number of loops correctly */
 	ring_size = num_loops * dwords_per_loop;
 	ring_size += rdev->r600_blit.ring_size_common;
 	r = radeon_ring_lock(rdev, ring, ring_size);
-	if (r)
+	if (r) {
+		radeon_sa_bo_free(rdev, vb, NULL);
 		return r;
+	}
 
 	rdev->r600_blit.primitives.set_default_state(rdev);
 	rdev->r600_blit.primitives.set_shaders(rdev);
 	return 0;
 }
 
-void r600_blit_done_copy(struct radeon_device *rdev, struct radeon_fence *fence)
+void r600_blit_done_copy(struct radeon_device *rdev, struct radeon_fence *fence,
+			 struct radeon_sa_bo *vb)
 {
+	struct radeon_ring *ring = &rdev->ring[RADEON_RING_TYPE_GFX_INDEX];
 	int r;
 
-	if (rdev->r600_blit.vb_ib)
-		r600_vb_ib_put(rdev);
-
-	if (fence)
-		r = radeon_fence_emit(rdev, fence);
+	r = radeon_fence_emit(rdev, fence);
+	if (r) {
+		radeon_ring_unlock_undo(rdev, ring);
+		return;
+	}
 
-	radeon_ring_unlock_commit(rdev, &rdev->ring[RADEON_RING_TYPE_GFX_INDEX]);
+	radeon_ring_unlock_commit(rdev, ring);
+	radeon_sa_bo_free(rdev, &vb, fence);
 }
 
 void r600_kms_blit_copy(struct radeon_device *rdev,
 			u64 src_gpu_addr, u64 dst_gpu_addr,
-			unsigned num_gpu_pages)
+			unsigned num_gpu_pages,
+			struct radeon_sa_bo *vb)
 {
 	u64 vb_gpu_addr;
-	u32 *vb;
+	u32 *vb_cpu_addr;
 
-	DRM_DEBUG("emitting copy %16llx %16llx %d %d\n",
-		  src_gpu_addr, dst_gpu_addr,
-		  num_gpu_pages, rdev->r600_blit.vb_used);
-	vb = (u32 *)(rdev->r600_blit.vb_ib->ptr + rdev->r600_blit.vb_used);
+	DRM_DEBUG("emitting copy %16llx %16llx %d\n",
+		  src_gpu_addr, dst_gpu_addr, num_gpu_pages);
+	vb_cpu_addr = (u32 *)radeon_sa_bo_cpu_addr(vb);
+	vb_gpu_addr = radeon_sa_bo_gpu_addr(vb);
 
 	while (num_gpu_pages) {
 		int w, h;
@@ -756,39 +742,34 @@ void r600_kms_blit_copy(struct radeon_device *rdev,
 		size_in_bytes = pages_per_loop * RADEON_GPU_PAGE_SIZE;
 		DRM_DEBUG("rectangle w=%d h=%d\n", w, h);
 
-		if ((rdev->r600_blit.vb_used + 48) > rdev->r600_blit.vb_total) {
-			WARN_ON(1);
-		}
-
-		vb[0] = 0;
-		vb[1] = 0;
-		vb[2] = 0;
-		vb[3] = 0;
+		vb_cpu_addr[0] = 0;
+		vb_cpu_addr[1] = 0;
+		vb_cpu_addr[2] = 0;
+		vb_cpu_addr[3] = 0;
 
-		vb[4] = 0;
-		vb[5] = i2f(h);
-		vb[6] = 0;
-		vb[7] = i2f(h);
+		vb_cpu_addr[4] = 0;
+		vb_cpu_addr[5] = i2f(h);
+		vb_cpu_addr[6] = 0;
+		vb_cpu_addr[7] = i2f(h);
 
-		vb[8] = i2f(w);
-		vb[9] = i2f(h);
-		vb[10] = i2f(w);
-		vb[11] = i2f(h);
+		vb_cpu_addr[8] = i2f(w);
+		vb_cpu_addr[9] = i2f(h);
+		vb_cpu_addr[10] = i2f(w);
+		vb_cpu_addr[11] = i2f(h);
 
 		rdev->r600_blit.primitives.set_tex_resource(rdev, FMT_8_8_8_8,
 							    w, h, w, src_gpu_addr, size_in_bytes);
 		rdev->r600_blit.primitives.set_render_target(rdev, COLOR_8_8_8_8,
 							     w, h, dst_gpu_addr);
 		rdev->r600_blit.primitives.set_scissors(rdev, 0, 0, w, h);
-		vb_gpu_addr = rdev->r600_blit.vb_ib->gpu_addr + rdev->r600_blit.vb_used;
 		rdev->r600_blit.primitives.set_vtx_resource(rdev, vb_gpu_addr);
 		rdev->r600_blit.primitives.draw_auto(rdev);
 		rdev->r600_blit.primitives.cp_set_surface_sync(rdev,
 				    PACKET3_CB_ACTION_ENA | PACKET3_CB0_DEST_BASE_ENA,
 				    size_in_bytes, dst_gpu_addr);
 
-		vb += 12;
-		rdev->r600_blit.vb_used += 4*12;
+		vb_cpu_addr += 12;
+		vb_gpu_addr += 4*12;
 		src_gpu_addr += size_in_bytes;
 		dst_gpu_addr += size_in_bytes;
 		num_gpu_pages -= pages_per_loop;
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 9507be0..659855a 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -743,7 +743,6 @@ struct r600_blit_cp_primitives {
 };
 
 struct r600_blit {
-	struct mutex		mutex;
 	struct radeon_bo	*shader_obj;
 	struct r600_blit_cp_primitives primitives;
 	int max_dim;
@@ -753,8 +752,6 @@ struct r600_blit {
 	u32 vs_offset, ps_offset;
 	u32 state_offset;
 	u32 state_len;
-	u32 vb_used, vb_total;
-	struct radeon_ib *vb_ib;
 };
 
 void r600_blit_suspend(struct radeon_device *rdev);
diff --git a/drivers/gpu/drm/radeon/radeon_asic.h b/drivers/gpu/drm/radeon/radeon_asic.h
index 7830931..05a4e15 100644
--- a/drivers/gpu/drm/radeon/radeon_asic.h
+++ b/drivers/gpu/drm/radeon/radeon_asic.h
@@ -368,11 +368,14 @@ void r600_hdmi_init(struct drm_encoder *encoder);
 int r600_hdmi_buffer_status_changed(struct drm_encoder *encoder);
 void r600_hdmi_update_audio_settings(struct drm_encoder *encoder);
 /* r600 blit */
-int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages);
-void r600_blit_done_copy(struct radeon_device *rdev, struct radeon_fence *fence);
+int r600_blit_prepare_copy(struct radeon_device *rdev, unsigned num_gpu_pages,
+			   struct radeon_sa_bo **vb);
+void r600_blit_done_copy(struct radeon_device *rdev, struct radeon_fence *fence,
+			 struct radeon_sa_bo *vb);
 void r600_kms_blit_copy(struct radeon_device *rdev,
 			u64 src_gpu_addr, u64 dst_gpu_addr,
-			unsigned num_gpu_pages);
+			unsigned num_gpu_pages,
+			struct radeon_sa_bo *vb);
 int r600_mc_wait_for_idle(struct radeon_device *rdev);
 
 /*
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 20/20] drm/radeon: make the ib an inline object
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (18 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 19/20] drm/radeon: remove r600 blit mutex v2 Christian König
@ 2012-05-07 11:42 ` Christian König
  2012-05-07 14:34 ` SA and other Patches Jerome Glisse
  20 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-07 11:42 UTC (permalink / raw)
  To: j.glisse, dri-devel; +Cc: Jerome Glisse, Christian König

From: Jerome Glisse <jglisse@redhat.com>

No need to malloc it any more.

Signed-off-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Christian König <deathsimple@vodafone.de>
---
 drivers/gpu/drm/radeon/evergreen_cs.c |   10 +++---
 drivers/gpu/drm/radeon/r100.c         |   38 ++++++++++----------
 drivers/gpu/drm/radeon/r200.c         |    2 +-
 drivers/gpu/drm/radeon/r300.c         |    4 +-
 drivers/gpu/drm/radeon/r600.c         |   16 ++++----
 drivers/gpu/drm/radeon/r600_cs.c      |   22 +++++------
 drivers/gpu/drm/radeon/radeon.h       |    8 ++--
 drivers/gpu/drm/radeon/radeon_cs.c    |   63 ++++++++++++++++-----------------
 drivers/gpu/drm/radeon/radeon_ring.c  |   41 +++++++--------------
 9 files changed, 93 insertions(+), 111 deletions(-)

diff --git a/drivers/gpu/drm/radeon/evergreen_cs.c b/drivers/gpu/drm/radeon/evergreen_cs.c
index 70089d3..4e7dd2b 100644
--- a/drivers/gpu/drm/radeon/evergreen_cs.c
+++ b/drivers/gpu/drm/radeon/evergreen_cs.c
@@ -1057,7 +1057,7 @@ static int evergreen_cs_packet_parse_vline(struct radeon_cs_parser *p)
 	uint32_t header, h_idx, reg, wait_reg_mem_info;
 	volatile uint32_t *ib;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 
 	/* parse the WAIT_REG_MEM */
 	r = evergreen_cs_packet_parse(p, &wait_reg_mem, p->idx);
@@ -1215,7 +1215,7 @@ static int evergreen_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
 		if (!(evergreen_reg_safe_bm[i] & m))
 			return 0;
 	}
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	switch (reg) {
 	/* force following reg to 0 in an attempt to disable out buffer
 	 * which will need us to better understand how it works to perform
@@ -1896,7 +1896,7 @@ static int evergreen_packet3_check(struct radeon_cs_parser *p,
 	u32 idx_value;
 
 	track = (struct evergreen_cs_track *)p->track;
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	idx = pkt->idx + 1;
 	idx_value = radeon_get_ib_value(p, idx);
 
@@ -2610,8 +2610,8 @@ int evergreen_cs_parse(struct radeon_cs_parser *p)
 		}
 	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
 #if 0
-	for (r = 0; r < p->ib->length_dw; r++) {
-		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib->ptr[r]);
+	for (r = 0; r < p->ib.length_dw; r++) {
+		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib.ptr[r]);
 		mdelay(1);
 	}
 #endif
diff --git a/drivers/gpu/drm/radeon/r100.c b/drivers/gpu/drm/radeon/r100.c
index ad6ceb7..0874a6d 100644
--- a/drivers/gpu/drm/radeon/r100.c
+++ b/drivers/gpu/drm/radeon/r100.c
@@ -139,9 +139,9 @@ int r100_reloc_pitch_offset(struct radeon_cs_parser *p,
 		}
 
 		tmp |= tile_flags;
-		p->ib->ptr[idx] = (value & 0x3fc00000) | tmp;
+		p->ib.ptr[idx] = (value & 0x3fc00000) | tmp;
 	} else
-		p->ib->ptr[idx] = (value & 0xffc00000) | tmp;
+		p->ib.ptr[idx] = (value & 0xffc00000) | tmp;
 	return 0;
 }
 
@@ -156,7 +156,7 @@ int r100_packet3_load_vbpntr(struct radeon_cs_parser *p,
 	volatile uint32_t *ib;
 	u32 idx_value;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	track = (struct r100_cs_track *)p->track;
 	c = radeon_get_ib_value(p, idx++) & 0x1F;
 	if (c > 16) {
@@ -1275,7 +1275,7 @@ void r100_cs_dump_packet(struct radeon_cs_parser *p,
 	unsigned i;
 	unsigned idx;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	idx = pkt->idx;
 	for (i = 0; i <= (pkt->count + 1); i++, idx++) {
 		DRM_INFO("ib[%d]=0x%08X\n", idx, ib[idx]);
@@ -1354,7 +1354,7 @@ int r100_cs_packet_parse_vline(struct radeon_cs_parser *p)
 	uint32_t header, h_idx, reg;
 	volatile uint32_t *ib;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 
 	/* parse the wait until */
 	r = r100_cs_packet_parse(p, &waitreloc, p->idx);
@@ -1533,7 +1533,7 @@ static int r100_packet0_check(struct radeon_cs_parser *p,
 	u32 tile_flags = 0;
 	u32 idx_value;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	track = (struct r100_cs_track *)p->track;
 
 	idx_value = radeon_get_ib_value(p, idx);
@@ -1889,7 +1889,7 @@ static int r100_packet3_check(struct radeon_cs_parser *p,
 	volatile uint32_t *ib;
 	int r;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	idx = pkt->idx + 1;
 	track = (struct r100_cs_track *)p->track;
 	switch (pkt->opcode) {
@@ -3684,7 +3684,7 @@ void r100_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 
 int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 {
-	struct radeon_ib *ib;
+	struct radeon_ib ib;
 	uint32_t scratch;
 	uint32_t tmp = 0;
 	unsigned i;
@@ -3700,22 +3700,22 @@ int r100_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 	if (r) {
 		return r;
 	}
-	ib->ptr[0] = PACKET0(scratch, 0);
-	ib->ptr[1] = 0xDEADBEEF;
-	ib->ptr[2] = PACKET2(0);
-	ib->ptr[3] = PACKET2(0);
-	ib->ptr[4] = PACKET2(0);
-	ib->ptr[5] = PACKET2(0);
-	ib->ptr[6] = PACKET2(0);
-	ib->ptr[7] = PACKET2(0);
-	ib->length_dw = 8;
-	r = radeon_ib_schedule(rdev, ib);
+	ib.ptr[0] = PACKET0(scratch, 0);
+	ib.ptr[1] = 0xDEADBEEF;
+	ib.ptr[2] = PACKET2(0);
+	ib.ptr[3] = PACKET2(0);
+	ib.ptr[4] = PACKET2(0);
+	ib.ptr[5] = PACKET2(0);
+	ib.ptr[6] = PACKET2(0);
+	ib.ptr[7] = PACKET2(0);
+	ib.length_dw = 8;
+	r = radeon_ib_schedule(rdev, &ib);
 	if (r) {
 		radeon_scratch_free(rdev, scratch);
 		radeon_ib_free(rdev, &ib);
 		return r;
 	}
-	r = radeon_fence_wait(ib->fence, false);
+	r = radeon_fence_wait(ib.fence, false);
 	if (r) {
 		return r;
 	}
diff --git a/drivers/gpu/drm/radeon/r200.c b/drivers/gpu/drm/radeon/r200.c
index a59cc47..a26144d 100644
--- a/drivers/gpu/drm/radeon/r200.c
+++ b/drivers/gpu/drm/radeon/r200.c
@@ -154,7 +154,7 @@ int r200_packet0_check(struct radeon_cs_parser *p,
 	u32 tile_flags = 0;
 	u32 idx_value;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	track = (struct r100_cs_track *)p->track;
 	idx_value = radeon_get_ib_value(p, idx);
 	switch (reg) {
diff --git a/drivers/gpu/drm/radeon/r300.c b/drivers/gpu/drm/radeon/r300.c
index 6419a59..97722a3 100644
--- a/drivers/gpu/drm/radeon/r300.c
+++ b/drivers/gpu/drm/radeon/r300.c
@@ -604,7 +604,7 @@ static int r300_packet0_check(struct radeon_cs_parser *p,
 	int r;
 	u32 idx_value;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	track = (struct r100_cs_track *)p->track;
 	idx_value = radeon_get_ib_value(p, idx);
 
@@ -1146,7 +1146,7 @@ static int r300_packet3_check(struct radeon_cs_parser *p,
 	unsigned idx;
 	int r;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	idx = pkt->idx + 1;
 	track = (struct r100_cs_track *)p->track;
 	switch(pkt->opcode) {
diff --git a/drivers/gpu/drm/radeon/r600.c b/drivers/gpu/drm/radeon/r600.c
index 9d6009a..4c7fafe 100644
--- a/drivers/gpu/drm/radeon/r600.c
+++ b/drivers/gpu/drm/radeon/r600.c
@@ -2681,7 +2681,7 @@ void r600_ring_ib_execute(struct radeon_device *rdev, struct radeon_ib *ib)
 
 int r600_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 {
-	struct radeon_ib *ib;
+	struct radeon_ib ib;
 	uint32_t scratch;
 	uint32_t tmp = 0;
 	unsigned i;
@@ -2699,18 +2699,18 @@ int r600_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 		DRM_ERROR("radeon: failed to get ib (%d).\n", r);
 		return r;
 	}
-	ib->ptr[0] = PACKET3(PACKET3_SET_CONFIG_REG, 1);
-	ib->ptr[1] = ((scratch - PACKET3_SET_CONFIG_REG_OFFSET) >> 2);
-	ib->ptr[2] = 0xDEADBEEF;
-	ib->length_dw = 3;
-	r = radeon_ib_schedule(rdev, ib);
+	ib.ptr[0] = PACKET3(PACKET3_SET_CONFIG_REG, 1);
+	ib.ptr[1] = ((scratch - PACKET3_SET_CONFIG_REG_OFFSET) >> 2);
+	ib.ptr[2] = 0xDEADBEEF;
+	ib.length_dw = 3;
+	r = radeon_ib_schedule(rdev, &ib);
 	if (r) {
 		radeon_scratch_free(rdev, scratch);
 		radeon_ib_free(rdev, &ib);
 		DRM_ERROR("radeon: failed to schedule ib (%d).\n", r);
 		return r;
 	}
-	r = radeon_fence_wait(ib->fence, false);
+	r = radeon_fence_wait(ib.fence, false);
 	if (r) {
 		DRM_ERROR("radeon: fence wait failed (%d).\n", r);
 		return r;
@@ -2722,7 +2722,7 @@ int r600_ib_test(struct radeon_device *rdev, struct radeon_ring *ring)
 		DRM_UDELAY(1);
 	}
 	if (i < rdev->usec_timeout) {
-		DRM_INFO("ib test on ring %d succeeded in %u usecs\n", ib->fence->ring, i);
+		DRM_INFO("ib test on ring %d succeeded in %u usecs\n", ib.fence->ring, i);
 	} else {
 		DRM_ERROR("radeon: ib test failed (scratch(0x%04X)=0x%08X)\n",
 			  scratch, tmp);
diff --git a/drivers/gpu/drm/radeon/r600_cs.c b/drivers/gpu/drm/radeon/r600_cs.c
index b8e12af..0133f5f 100644
--- a/drivers/gpu/drm/radeon/r600_cs.c
+++ b/drivers/gpu/drm/radeon/r600_cs.c
@@ -345,7 +345,7 @@ static int r600_cs_track_validate_cb(struct radeon_cs_parser *p, int i)
 	u32 height, height_align, pitch, pitch_align, depth_align;
 	u64 base_offset, base_align;
 	struct array_mode_checker array_check;
-	volatile u32 *ib = p->ib->ptr;
+	volatile u32 *ib = p->ib.ptr;
 	unsigned array_mode;
 	u32 format;
 
@@ -471,7 +471,7 @@ static int r600_cs_track_validate_db(struct radeon_cs_parser *p)
 	u64 base_offset, base_align;
 	struct array_mode_checker array_check;
 	int array_mode;
-	volatile u32 *ib = p->ib->ptr;
+	volatile u32 *ib = p->ib.ptr;
 
 
 	if (track->db_bo == NULL) {
@@ -961,7 +961,7 @@ static int r600_cs_packet_parse_vline(struct radeon_cs_parser *p)
 	uint32_t header, h_idx, reg, wait_reg_mem_info;
 	volatile uint32_t *ib;
 
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 
 	/* parse the WAIT_REG_MEM */
 	r = r600_cs_packet_parse(p, &wait_reg_mem, p->idx);
@@ -1110,7 +1110,7 @@ static int r600_cs_check_reg(struct radeon_cs_parser *p, u32 reg, u32 idx)
 	m = 1 << ((reg >> 2) & 31);
 	if (!(r600_reg_safe_bm[i] & m))
 		return 0;
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	switch (reg) {
 	/* force following reg to 0 in an attempt to disable out buffer
 	 * which will need us to better understand how it works to perform
@@ -1714,7 +1714,7 @@ static int r600_packet3_check(struct radeon_cs_parser *p,
 	u32 idx_value;
 
 	track = (struct r600_cs_track *)p->track;
-	ib = p->ib->ptr;
+	ib = p->ib.ptr;
 	idx = pkt->idx + 1;
 	idx_value = radeon_get_ib_value(p, idx);
 
@@ -2249,8 +2249,8 @@ int r600_cs_parse(struct radeon_cs_parser *p)
 		}
 	} while (p->idx < p->chunks[p->chunk_ib_idx].length_dw);
 #if 0
-	for (r = 0; r < p->ib->length_dw; r++) {
-		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib->ptr[r]);
+	for (r = 0; r < p->ib.length_dw; r++) {
+		printk(KERN_INFO "%05d  0x%08X\n", r, p->ib.ptr[r]);
 		mdelay(1);
 	}
 #endif
@@ -2298,7 +2298,6 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
 {
 	struct radeon_cs_parser parser;
 	struct radeon_cs_chunk *ib_chunk;
-	struct radeon_ib fake_ib;
 	struct r600_cs_track *track;
 	int r;
 
@@ -2314,9 +2313,8 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
 	parser.dev = &dev->pdev->dev;
 	parser.rdev = NULL;
 	parser.family = family;
-	parser.ib = &fake_ib;
 	parser.track = track;
-	fake_ib.ptr = ib;
+	parser.ib.ptr = ib;
 	r = radeon_cs_parser_init(&parser, data);
 	if (r) {
 		DRM_ERROR("Failed to initialize parser !\n");
@@ -2333,8 +2331,8 @@ int r600_cs_legacy(struct drm_device *dev, void *data, struct drm_file *filp,
 	 * input memory (cached) and write to the IB (which can be
 	 * uncached). */
 	ib_chunk = &parser.chunks[parser.chunk_ib_idx];
-	parser.ib->length_dw = ib_chunk->length_dw;
-	*l = parser.ib->length_dw;
+	parser.ib.length_dw = ib_chunk->length_dw;
+	*l = parser.ib.length_dw;
 	r = r600_cs_parse(&parser);
 	if (r) {
 		DRM_ERROR("Invalid command stream !\n");
diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 659855a..60233d7 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -769,8 +769,8 @@ struct si_rlc {
 };
 
 int radeon_ib_get(struct radeon_device *rdev, int ring,
-		  struct radeon_ib **ib, unsigned size);
-void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib);
+		  struct radeon_ib *ib, unsigned size);
+void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib);
 int radeon_ib_pool_init(struct radeon_device *rdev);
 void radeon_ib_pool_fini(struct radeon_device *rdev);
@@ -838,8 +838,8 @@ struct radeon_cs_parser {
 	int			chunk_relocs_idx;
 	int			chunk_flags_idx;
 	int			chunk_const_ib_idx;
-	struct radeon_ib	*ib;
-	struct radeon_ib	*const_ib;
+	struct radeon_ib	ib;
+	struct radeon_ib	const_ib;
 	void			*track;
 	unsigned		family;
 	int			parser_error;
diff --git a/drivers/gpu/drm/radeon/radeon_cs.c b/drivers/gpu/drm/radeon/radeon_cs.c
index dcfe2a0..c7d64a7 100644
--- a/drivers/gpu/drm/radeon/radeon_cs.c
+++ b/drivers/gpu/drm/radeon/radeon_cs.c
@@ -138,12 +138,12 @@ static int radeon_cs_sync_rings(struct radeon_cs_parser *p)
 		return 0;
 	}
 
-	r = radeon_semaphore_create(p->rdev, &p->ib->semaphore);
+	r = radeon_semaphore_create(p->rdev, &p->ib.semaphore);
 	if (r) {
 		return r;
 	}
 
-	return radeon_semaphore_sync_rings(p->rdev, p->ib->semaphore,
+	return radeon_semaphore_sync_rings(p->rdev, p->ib.semaphore,
 					   sync_to_ring, p->ring);
 }
 
@@ -161,8 +161,10 @@ int radeon_cs_parser_init(struct radeon_cs_parser *p, void *data)
 	/* get chunks */
 	INIT_LIST_HEAD(&p->validated);
 	p->idx = 0;
-	p->ib = NULL;
-	p->const_ib = NULL;
+	p->ib.sa_bo = NULL;
+	p->ib.semaphore = NULL;
+	p->const_ib.sa_bo = NULL;
+	p->const_ib.semaphore = NULL;
 	p->chunk_ib_idx = -1;
 	p->chunk_relocs_idx = -1;
 	p->chunk_flags_idx = -1;
@@ -301,10 +303,9 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error)
 {
 	unsigned i;
 
-
-	if (!error && parser->ib)
+	if (!error)
 		ttm_eu_fence_buffer_objects(&parser->validated,
-					    parser->ib->fence);
+					    parser->ib.fence);
 	else
 		ttm_eu_backoff_reservation(&parser->validated);
 
@@ -327,9 +328,7 @@ static void radeon_cs_parser_fini(struct radeon_cs_parser *parser, int error)
 	kfree(parser->chunks);
 	kfree(parser->chunks_array);
 	radeon_ib_free(parser->rdev, &parser->ib);
-	if (parser->const_ib) {
-		radeon_ib_free(parser->rdev, &parser->const_ib);
-	}
+	radeon_ib_free(parser->rdev, &parser->const_ib);
 }
 
 static int radeon_cs_ib_chunk(struct radeon_device *rdev,
@@ -355,7 +354,7 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 		DRM_ERROR("Failed to get ib !\n");
 		return r;
 	}
-	parser->ib->length_dw = ib_chunk->length_dw;
+	parser->ib.length_dw = ib_chunk->length_dw;
 	r = radeon_cs_parse(rdev, parser->ring, parser);
 	if (r || parser->parser_error) {
 		DRM_ERROR("Invalid command stream !\n");
@@ -370,8 +369,8 @@ static int radeon_cs_ib_chunk(struct radeon_device *rdev,
 	if (r) {
 		DRM_ERROR("Failed to synchronize rings !\n");
 	}
-	parser->ib->vm_id = 0;
-	r = radeon_ib_schedule(rdev, parser->ib);
+	parser->ib.vm_id = 0;
+	r = radeon_ib_schedule(rdev, &parser->ib);
 	if (r) {
 		DRM_ERROR("Failed to schedule IB !\n");
 	}
@@ -422,14 +421,14 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 			DRM_ERROR("Failed to get const ib !\n");
 			return r;
 		}
-		parser->const_ib->is_const_ib = true;
-		parser->const_ib->length_dw = ib_chunk->length_dw;
+		parser->const_ib.is_const_ib = true;
+		parser->const_ib.length_dw = ib_chunk->length_dw;
 		/* Copy the packet into the IB */
-		if (DRM_COPY_FROM_USER(parser->const_ib->ptr, ib_chunk->user_ptr,
+		if (DRM_COPY_FROM_USER(parser->const_ib.ptr, ib_chunk->user_ptr,
 				       ib_chunk->length_dw * 4)) {
 			return -EFAULT;
 		}
-		r = radeon_ring_ib_parse(rdev, parser->ring, parser->const_ib);
+		r = radeon_ring_ib_parse(rdev, parser->ring, &parser->const_ib);
 		if (r) {
 			return r;
 		}
@@ -446,13 +445,13 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 		DRM_ERROR("Failed to get ib !\n");
 		return r;
 	}
-	parser->ib->length_dw = ib_chunk->length_dw;
+	parser->ib.length_dw = ib_chunk->length_dw;
 	/* Copy the packet into the IB */
-	if (DRM_COPY_FROM_USER(parser->ib->ptr, ib_chunk->user_ptr,
+	if (DRM_COPY_FROM_USER(parser->ib.ptr, ib_chunk->user_ptr,
 			       ib_chunk->length_dw * 4)) {
 		return -EFAULT;
 	}
-	r = radeon_ring_ib_parse(rdev, parser->ring, parser->ib);
+	r = radeon_ring_ib_parse(rdev, parser->ring, &parser->ib);
 	if (r) {
 		return r;
 	}
@@ -473,29 +472,29 @@ static int radeon_cs_ib_vm_chunk(struct radeon_device *rdev,
 
 	if ((rdev->family >= CHIP_TAHITI) &&
 	    (parser->chunk_const_ib_idx != -1)) {
-		parser->const_ib->vm_id = vm->id;
+		parser->const_ib.vm_id = vm->id;
 		/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 		 * offset inside the pool bo
 		 */
-		parser->const_ib->gpu_addr = parser->const_ib->sa_bo->soffset;
-		r = radeon_ib_schedule(rdev, parser->const_ib);
+		parser->const_ib.gpu_addr = parser->const_ib.sa_bo->soffset;
+		r = radeon_ib_schedule(rdev, &parser->const_ib);
 		if (r)
 			goto out;
 	}
 
-	parser->ib->vm_id = vm->id;
+	parser->ib.vm_id = vm->id;
 	/* ib pool is bind at 0 in virtual address space to gpu_addr is the
 	 * offset inside the pool bo
 	 */
-	parser->ib->gpu_addr = parser->ib->sa_bo->soffset;
-	parser->ib->is_const_ib = false;
-	r = radeon_ib_schedule(rdev, parser->ib);
+	parser->ib.gpu_addr = parser->ib.sa_bo->soffset;
+	parser->ib.is_const_ib = false;
+	r = radeon_ib_schedule(rdev, &parser->ib);
 out:
 	if (!r) {
 		if (vm->fence) {
 			radeon_fence_unref(&vm->fence);
 		}
-		vm->fence = radeon_fence_ref(parser->ib->fence);
+		vm->fence = radeon_fence_ref(parser->ib.fence);
 	}
 	mutex_unlock(&fpriv->vm.mutex);
 	return r;
@@ -573,7 +572,7 @@ int radeon_cs_finish_pages(struct radeon_cs_parser *p)
 				size = PAGE_SIZE;
 		}
 		
-		if (DRM_COPY_FROM_USER(p->ib->ptr + (i * (PAGE_SIZE/4)),
+		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
 				       ibc->user_ptr + (i * PAGE_SIZE),
 				       size))
 			return -EFAULT;
@@ -590,7 +589,7 @@ int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
 	bool copy1 = (p->rdev->flags & RADEON_IS_AGP) ? false : true;
 
 	for (i = ibc->last_copied_page + 1; i < pg_idx; i++) {
-		if (DRM_COPY_FROM_USER(p->ib->ptr + (i * (PAGE_SIZE/4)),
+		if (DRM_COPY_FROM_USER(p->ib.ptr + (i * (PAGE_SIZE/4)),
 				       ibc->user_ptr + (i * PAGE_SIZE),
 				       PAGE_SIZE)) {
 			p->parser_error = -EFAULT;
@@ -606,7 +605,7 @@ int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
 
 	new_page = ibc->kpage_idx[0] < ibc->kpage_idx[1] ? 0 : 1;
 	if (copy1)
-		ibc->kpage[new_page] = p->ib->ptr + (pg_idx * (PAGE_SIZE / 4));
+		ibc->kpage[new_page] = p->ib.ptr + (pg_idx * (PAGE_SIZE / 4));
 
 	if (DRM_COPY_FROM_USER(ibc->kpage[new_page],
 			       ibc->user_ptr + (pg_idx * PAGE_SIZE),
@@ -617,7 +616,7 @@ int radeon_cs_update_pages(struct radeon_cs_parser *p, int pg_idx)
 
 	/* copy to IB for non single case */
 	if (!copy1)
-		memcpy((void *)(p->ib->ptr+(pg_idx*(PAGE_SIZE/4))), ibc->kpage[new_page], size);
+		memcpy((void *)(p->ib.ptr+(pg_idx*(PAGE_SIZE/4))), ibc->kpage[new_page], size);
 
 	ibc->last_copied_page = pg_idx;
 	ibc->kpage_idx[new_page] = pg_idx;
diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
index af8e1ee..a5dee76 100644
--- a/drivers/gpu/drm/radeon/radeon_ring.c
+++ b/drivers/gpu/drm/radeon/radeon_ring.c
@@ -65,51 +65,36 @@ u32 radeon_get_ib_value(struct radeon_cs_parser *p, int idx)
 }
 
 int radeon_ib_get(struct radeon_device *rdev, int ring,
-		  struct radeon_ib **ib, unsigned size)
+		  struct radeon_ib *ib, unsigned size)
 {
 	int r;
 
-	*ib = kmalloc(sizeof(struct radeon_ib), GFP_KERNEL);
-	if (*ib == NULL) {
-		return -ENOMEM;
-	}
-	r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &(*ib)->sa_bo, size, 256, true);
+	r = radeon_sa_bo_new(rdev, &rdev->ring_tmp_bo, &ib->sa_bo, size, 256, true);
 	if (r) {
 		dev_err(rdev->dev, "failed to get a new IB (%d)\n", r);
-		kfree(*ib);
-		*ib = NULL;
 		return r;
 	}
-	r = radeon_fence_create(rdev, &(*ib)->fence, ring);
+	r = radeon_fence_create(rdev, &ib->fence, ring);
 	if (r) {
 		dev_err(rdev->dev, "failed to create fence for new IB (%d)\n", r);
-		radeon_sa_bo_free(rdev, &(*ib)->sa_bo, NULL);
-		kfree(*ib);
-		*ib = NULL;
+		radeon_sa_bo_free(rdev, &ib->sa_bo, NULL);
 		return r;
 	}
 
-	(*ib)->ptr = radeon_sa_bo_cpu_addr((*ib)->sa_bo);
-	(*ib)->gpu_addr = radeon_sa_bo_gpu_addr((*ib)->sa_bo);
-	(*ib)->vm_id = 0;
-	(*ib)->is_const_ib = false;
-	(*ib)->semaphore = NULL;
+	ib->ptr = radeon_sa_bo_cpu_addr(ib->sa_bo);
+	ib->gpu_addr = radeon_sa_bo_gpu_addr(ib->sa_bo);
+	ib->vm_id = 0;
+	ib->is_const_ib = false;
+	ib->semaphore = NULL;
 
 	return 0;
 }
 
-void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
+void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib *ib)
 {
-	struct radeon_ib *tmp = *ib;
-
-	*ib = NULL;
-	if (tmp == NULL) {
-		return;
-	}
-	radeon_semaphore_free(rdev, tmp->semaphore, tmp->fence);
-	radeon_sa_bo_free(rdev, &tmp->sa_bo, tmp->fence);
-	radeon_fence_unref(&tmp->fence);
-	kfree(tmp);
+	radeon_semaphore_free(rdev, ib->semaphore, ib->fence);
+	radeon_sa_bo_free(rdev, &ib->sa_bo, ib->fence);
+	radeon_fence_unref(&ib->fence);
 }
 
 int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib)
-- 
1.7.5.4

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: SA and other Patches.
  2012-05-07 11:42 SA and other Patches Christian König
                   ` (19 preceding siblings ...)
  2012-05-07 11:42 ` [PATCH 20/20] drm/radeon: make the ib an inline object Christian König
@ 2012-05-07 14:34 ` Jerome Glisse
  2012-05-07 15:30   ` Jerome Glisse
  20 siblings, 1 reply; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 14:34 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, May 7, 2012 at 7:42 AM, Christian König <deathsimple@vodafone.de> wrote:
> Hi Jerome & everybody on the list,
>
> this gathers together every patch we developed over the last week or so and
> which is not already in drm-next.
>
> I've run quite some tests with them yesterday and today and as far as I can
> see hammered out every known bug. For the SA allocator I reverted to tracking
> the hole pointer instead of just the last allocation, cause otherwise we will
> never release the first allocation on the list. Glxgears now even keeps happily
> running if I deadlock on the not GFX rings on purpose.

Now we will release the first entry if we use the last allocate ptr i
believe it's cleaner to use the last ptr.

> Please take a second look at them and if nobody objects any more we should
> commit them to drm-next.
>
> Cheers,
> Christian.
>


Cheers,
Jerome

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/20] drm/radeon: convert fence to uint64_t v4
  2012-05-07 11:42 ` [PATCH 04/20] drm/radeon: convert fence to uint64_t v4 Christian König
@ 2012-05-07 14:39   ` Jerome Glisse
  2012-05-07 15:04     ` Christian König
  0 siblings, 1 reply; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 14:39 UTC (permalink / raw)
  To: Christian König; +Cc: Jerome Glisse, dri-devel

On Mon, May 7, 2012 at 7:42 AM, Christian König <deathsimple@vodafone.de> wrote:
> From: Jerome Glisse <jglisse@redhat.com>
>
> This convert fence to use uint64_t sequence number intention is
> to use the fact that uin64_t is big enough that we don't need to
> care about wrap around.
>
> Tested with and without writeback using 0xFFFFF000 as initial
> fence sequence and thus allowing to test the wrap around from
> 32bits to 64bits.
>
> v2: Add comment about possible race btw CPU & GPU, add comment
>    stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET
>    Read fence sequenc in reverse order of GPU write them so we
>    mitigate the race btw CPU and GPU.
>
> v3: Drop the need for ring to emit the 64bits fence, and just have
>    each ring emit the lower 32bits of the fence sequence. We
>    handle the wrap over 32bits in fence_process.
>
> v4: Just a small optimization: Don't reread the last_seq value
>    if loop restarts, since we already know its value anyway.
>    Also start at zero not one for seq value and use pre instead
>    of post increment in emmit, otherwise wait_empty will deadlock.

Why changing that v3 was already good no deadlock. I started at 1
especialy for that, a signaled fence is set to 0 so it always compare
as signaled. Just using preincrement is exactly like starting at one.
I don't see the need for this change but if it makes you happy.

Cheers,
Jerome
>
> Signed-off-by: Jerome Glisse <jglisse@redhat.com>
> Signed-off-by: Christian König <deathsimple@vodafone.de>
> ---
>  drivers/gpu/drm/radeon/radeon.h       |   39 ++++++-----
>  drivers/gpu/drm/radeon/radeon_fence.c |  116 +++++++++++++++++++++++----------
>  drivers/gpu/drm/radeon/radeon_ring.c  |    9 ++-
>  3 files changed, 107 insertions(+), 57 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index e99ea81..cdf46bc 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout;
>  * Copy from radeon_drv.h so we don't have to include both and have conflicting
>  * symbol;
>  */
> -#define RADEON_MAX_USEC_TIMEOUT                100000  /* 100 ms */
> -#define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
> +#define RADEON_MAX_USEC_TIMEOUT                        100000  /* 100 ms */
> +#define RADEON_FENCE_JIFFIES_TIMEOUT           (HZ / 2)
>  /* RADEON_IB_POOL_SIZE must be a power of 2 */
> -#define RADEON_IB_POOL_SIZE            16
> -#define RADEON_DEBUGFS_MAX_COMPONENTS  32
> -#define RADEONFB_CONN_LIMIT            4
> -#define RADEON_BIOS_NUM_SCRATCH                8
> +#define RADEON_IB_POOL_SIZE                    16
> +#define RADEON_DEBUGFS_MAX_COMPONENTS          32
> +#define RADEONFB_CONN_LIMIT                    4
> +#define RADEON_BIOS_NUM_SCRATCH                        8
>
>  /* max number of rings */
> -#define RADEON_NUM_RINGS 3
> +#define RADEON_NUM_RINGS                       3
> +
> +/* fence seq are set to this number when signaled */
> +#define RADEON_FENCE_SIGNALED_SEQ              0LL
> +#define RADEON_FENCE_NOTEMITED_SEQ             (~0LL)
>
>  /* internal ring indices */
>  /* r1xx+ has gfx CP ring */
> -#define RADEON_RING_TYPE_GFX_INDEX  0
> +#define RADEON_RING_TYPE_GFX_INDEX             0
>
>  /* cayman has 2 compute CP rings */
> -#define CAYMAN_RING_TYPE_CP1_INDEX 1
> -#define CAYMAN_RING_TYPE_CP2_INDEX 2
> +#define CAYMAN_RING_TYPE_CP1_INDEX             1
> +#define CAYMAN_RING_TYPE_CP2_INDEX             2
>
>  /* hardcode those limit for now */
> -#define RADEON_VA_RESERVED_SIZE                (8 << 20)
> -#define RADEON_IB_VM_MAX_SIZE          (64 << 10)
> +#define RADEON_VA_RESERVED_SIZE                        (8 << 20)
> +#define RADEON_IB_VM_MAX_SIZE                  (64 << 10)
>
>  /*
>  * Errata workarounds.
> @@ -254,8 +258,9 @@ struct radeon_fence_driver {
>        uint32_t                        scratch_reg;
>        uint64_t                        gpu_addr;
>        volatile uint32_t               *cpu_addr;
> -       atomic_t                        seq;
> -       uint32_t                        last_seq;
> +       /* seq is protected by ring emission lock */
> +       uint64_t                        seq;
> +       atomic64_t                      last_seq;
>        unsigned long                   last_activity;
>        wait_queue_head_t               queue;
>        struct list_head                emitted;
> @@ -268,11 +273,9 @@ struct radeon_fence {
>        struct kref                     kref;
>        struct list_head                list;
>        /* protected by radeon_fence.lock */
> -       uint32_t                        seq;
> -       bool                            emitted;
> -       bool                            signaled;
> +       uint64_t                        seq;
>        /* RB, DMA, etc. */
> -       int                             ring;
> +       unsigned                        ring;
>        struct radeon_semaphore         *semaphore;
>  };
>
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> index 5bb78bf..feb2bbc 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence)
>        unsigned long irq_flags;
>
>        write_lock_irqsave(&rdev->fence_lock, irq_flags);
> -       if (fence->emitted) {
> +       if (fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
>                write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>                return 0;
>        }
> -       fence->seq = atomic_add_return(1, &rdev->fence_drv[fence->ring].seq);
> +       /* we are protected by the ring emission mutex */
> +       fence->seq = ++rdev->fence_drv[fence->ring].seq;
>        radeon_fence_ring_emit(rdev, fence->ring, fence);
>        trace_radeon_fence_emit(rdev->ddev, fence->seq);
> -       fence->emitted = true;
>        /* are we the first fence on a previusly idle ring? */
>        if (list_empty(&rdev->fence_drv[fence->ring].emitted)) {
>                rdev->fence_drv[fence->ring].last_activity = jiffies;
> @@ -87,14 +87,60 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
>  {
>        struct radeon_fence *fence;
>        struct list_head *i, *n;
> -       uint32_t seq;
> +       uint64_t seq, last_seq;
> +       unsigned count_loop = 0;
>        bool wake = false;
>
> -       seq = radeon_fence_read(rdev, ring);
> -       if (seq == rdev->fence_drv[ring].last_seq)
> -               return false;
> +       /* Note there is a scenario here for an infinite loop but it's
> +        * very unlikely to happen. For it to happen, the current polling
> +        * process need to be interrupted by another process and another
> +        * process needs to update the last_seq btw the atomic read and
> +        * xchg of the current process.
> +        *
> +        * More over for this to go in infinite loop there need to be
> +        * continuously new fence signaled ie radeon_fence_read needs
> +        * to return a different value each time for both the currently
> +        * polling process and the other process that xchg the last_seq
> +        * btw atomic read and xchg of the current process. And the
> +        * value the other process set as last seq must be higher than
> +        * the seq value we just read. Which means that current process
> +        * need to be interrupted after radeon_fence_read and before
> +        * atomic xchg.
> +        *
> +        * To be even more safe we count the number of time we loop and
> +        * we bail after 10 loop just accepting the fact that we might
> +        * have temporarly set the last_seq not to the true real last
> +        * seq but to an older one.
> +        */
> +       last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
> +       do {
> +               seq = radeon_fence_read(rdev, ring);
> +               seq |= last_seq & 0xffffffff00000000LL;
> +               if (seq < last_seq) {
> +                       seq += 0x100000000LL;
> +               }
>
> -       rdev->fence_drv[ring].last_seq = seq;
> +               if (!wake && seq == last_seq) {
> +                       return false;
> +               }
> +               /* If we loop over we don't want to return without
> +                * checking if a fence is signaled as it means that the
> +                * seq we just read is different from the previous on.
> +                */
> +               wake = true;
> +               if ((count_loop++) > 10) {
> +                       /* We looped over too many time leave with the
> +                        * fact that we might have set an older fence
> +                        * seq then the current real last seq as signaled
> +                        * by the hw.
> +                        */
> +                       break;
> +               }
> +               last_seq = seq;
> +       } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq) > seq);
> +
> +       /* reset wake to false */
> +       wake = false;
>        rdev->fence_drv[ring].last_activity = jiffies;
>
>        n = NULL;
> @@ -112,7 +158,7 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
>                        n = i->prev;
>                        list_move_tail(i, &rdev->fence_drv[ring].signaled);
>                        fence = list_entry(i, struct radeon_fence, list);
> -                       fence->signaled = true;
> +                       fence->seq = RADEON_FENCE_SIGNALED_SEQ;
>                        i = n;
>                } while (i != &rdev->fence_drv[ring].emitted);
>                wake = true;
> @@ -128,7 +174,7 @@ static void radeon_fence_destroy(struct kref *kref)
>        fence = container_of(kref, struct radeon_fence, kref);
>        write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
>        list_del(&fence->list);
> -       fence->emitted = false;
> +       fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
>        write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
>        if (fence->semaphore)
>                radeon_semaphore_free(fence->rdev, fence->semaphore);
> @@ -145,9 +191,7 @@ int radeon_fence_create(struct radeon_device *rdev,
>        }
>        kref_init(&((*fence)->kref));
>        (*fence)->rdev = rdev;
> -       (*fence)->emitted = false;
> -       (*fence)->signaled = false;
> -       (*fence)->seq = 0;
> +       (*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ;
>        (*fence)->ring = ring;
>        (*fence)->semaphore = NULL;
>        INIT_LIST_HEAD(&(*fence)->list);
> @@ -163,18 +207,18 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
>                return true;
>
>        write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
> -       signaled = fence->signaled;
> +       signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
>        /* if we are shuting down report all fence as signaled */
>        if (fence->rdev->shutdown) {
>                signaled = true;
>        }
> -       if (!fence->emitted) {
> +       if (fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
>                WARN(1, "Querying an unemitted fence : %p !\n", fence);
>                signaled = true;
>        }
>        if (!signaled) {
>                radeon_fence_poll_locked(fence->rdev, fence->ring);
> -               signaled = fence->signaled;
> +               signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
>        }
>        write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
>        return signaled;
> @@ -183,8 +227,8 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
>  int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>  {
>        struct radeon_device *rdev;
> -       unsigned long irq_flags, timeout;
> -       u32 seq;
> +       unsigned long irq_flags, timeout, last_activity;
> +       uint64_t seq;
>        int i, r;
>        bool signaled;
>
> @@ -207,7 +251,9 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>                        timeout = 1;
>                }
>                /* save current sequence value used to check for GPU lockups */
> -               seq = rdev->fence_drv[fence->ring].last_seq;
> +               seq = atomic64_read(&rdev->fence_drv[fence->ring].last_seq);
> +               /* Save current last activity valuee, used to check for GPU lockups */
> +               last_activity = rdev->fence_drv[fence->ring].last_activity;
>                read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>
>                trace_radeon_fence_wait_begin(rdev->ddev, seq);
> @@ -235,24 +281,23 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>                        }
>
>                        write_lock_irqsave(&rdev->fence_lock, irq_flags);
> -                       /* check if sequence value has changed since last_activity */
> -                       if (seq != rdev->fence_drv[fence->ring].last_seq) {
> +                       /* test if somebody else has already decided that this is a lockup */
> +                       if (last_activity != rdev->fence_drv[fence->ring].last_activity) {
>                                write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>                                continue;
>                        }
>
> -                       /* change sequence value on all rings, so nobody else things there is a lockup */
> -                       for (i = 0; i < RADEON_NUM_RINGS; ++i)
> -                               rdev->fence_drv[i].last_seq -= 0x10000;
> -
> -                       rdev->fence_drv[fence->ring].last_activity = jiffies;
>                        write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>
>                        if (radeon_ring_is_lockup(rdev, fence->ring, &rdev->ring[fence->ring])) {
> -
>                                /* good news we believe it's a lockup */
> -                               printk(KERN_WARNING "GPU lockup (waiting for 0x%08X last fence id 0x%08X)\n",
> -                                    fence->seq, seq);
> +                               dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx last fence id 0x%016llx)\n",
> +                                        fence->seq, seq);
> +
> +                               /* change last activity so nobody else think there is a lockup */
> +                               for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> +                                       rdev->fence_drv[i].last_activity = jiffies;
> +                               }
>
>                                /* mark the ring as not ready any more */
>                                rdev->ring[fence->ring].ready = false;
> @@ -387,9 +432,9 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
>        }
>        rdev->fence_drv[ring].cpu_addr = &rdev->wb.wb[index/4];
>        rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
> -       radeon_fence_write(rdev, atomic_read(&rdev->fence_drv[ring].seq), ring);
> +       radeon_fence_write(rdev, rdev->fence_drv[ring].seq, ring);
>        rdev->fence_drv[ring].initialized = true;
> -       DRM_INFO("fence driver on ring %d use gpu addr 0x%08Lx and cpu addr 0x%p\n",
> +       DRM_INFO("fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n",
>                 ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);
>        write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>        return 0;
> @@ -400,7 +445,8 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
>        rdev->fence_drv[ring].scratch_reg = -1;
>        rdev->fence_drv[ring].cpu_addr = NULL;
>        rdev->fence_drv[ring].gpu_addr = 0;
> -       atomic_set(&rdev->fence_drv[ring].seq, 0);
> +       rdev->fence_drv[ring].seq = 0;
> +       atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
>        INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
>        INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
>        init_waitqueue_head(&rdev->fence_drv[ring].queue);
> @@ -458,12 +504,12 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data)
>                        continue;
>
>                seq_printf(m, "--- ring %d ---\n", i);
> -               seq_printf(m, "Last signaled fence 0x%08X\n",
> -                          radeon_fence_read(rdev, i));
> +               seq_printf(m, "Last signaled fence 0x%016lx\n",
> +                          atomic64_read(&rdev->fence_drv[i].last_seq));
>                if (!list_empty(&rdev->fence_drv[i].emitted)) {
>                        fence = list_entry(rdev->fence_drv[i].emitted.prev,
>                                           struct radeon_fence, list);
> -                       seq_printf(m, "Last emitted fence %p with 0x%08X\n",
> +                       seq_printf(m, "Last emitted fence %p with 0x%016llx\n",
>                                   fence,  fence->seq);
>                }
>        }
> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
> index a4d60ae..4ae222b 100644
> --- a/drivers/gpu/drm/radeon/radeon_ring.c
> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
> @@ -82,7 +82,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib)
>        bool done = false;
>
>        /* only free ib which have been emited */
> -       if (ib->fence && ib->fence->emitted) {
> +       if (ib->fence && ib->fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
>                if (radeon_fence_signaled(ib->fence)) {
>                        radeon_fence_unref(&ib->fence);
>                        radeon_sa_bo_free(rdev, &ib->sa_bo);
> @@ -149,8 +149,9 @@ retry:
>        /* this should be rare event, ie all ib scheduled none signaled yet.
>         */
>        for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
> -               if (rdev->ib_pool.ibs[idx].fence && rdev->ib_pool.ibs[idx].fence->emitted) {
> -                       r = radeon_fence_wait(rdev->ib_pool.ibs[idx].fence, false);
> +               struct radeon_fence *fence = rdev->ib_pool.ibs[idx].fence;
> +               if (fence && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
> +                       r = radeon_fence_wait(fence, false);
>                        if (!r) {
>                                goto retry;
>                        }
> @@ -173,7 +174,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
>                return;
>        }
>        radeon_mutex_lock(&rdev->ib_pool.mutex);
> -       if (tmp->fence && !tmp->fence->emitted) {
> +       if (tmp->fence && tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
>                radeon_sa_bo_free(rdev, &tmp->sa_bo);
>                radeon_fence_unref(&tmp->fence);
>        }
> --
> 1.7.5.4
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/20] drm/radeon: convert fence to uint64_t v4
  2012-05-07 14:39   ` Jerome Glisse
@ 2012-05-07 15:04     ` Christian König
  2012-05-07 15:27       ` Jerome Glisse
  0 siblings, 1 reply; 35+ messages in thread
From: Christian König @ 2012-05-07 15:04 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Jerome Glisse, dri-devel

On 07.05.2012 16:39, Jerome Glisse wrote:
> On Mon, May 7, 2012 at 7:42 AM, Christian König<deathsimple@vodafone.de>  wrote:
>> From: Jerome Glisse<jglisse@redhat.com>
>>
>> This convert fence to use uint64_t sequence number intention is
>> to use the fact that uin64_t is big enough that we don't need to
>> care about wrap around.
>>
>> Tested with and without writeback using 0xFFFFF000 as initial
>> fence sequence and thus allowing to test the wrap around from
>> 32bits to 64bits.
>>
>> v2: Add comment about possible race btw CPU&  GPU, add comment
>>     stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET
>>     Read fence sequenc in reverse order of GPU write them so we
>>     mitigate the race btw CPU and GPU.
>>
>> v3: Drop the need for ring to emit the 64bits fence, and just have
>>     each ring emit the lower 32bits of the fence sequence. We
>>     handle the wrap over 32bits in fence_process.
>>
>> v4: Just a small optimization: Don't reread the last_seq value
>>     if loop restarts, since we already know its value anyway.
>>     Also start at zero not one for seq value and use pre instead
>>     of post increment in emmit, otherwise wait_empty will deadlock.
> Why changing that v3 was already good no deadlock. I started at 1
> especialy for that, a signaled fence is set to 0 so it always compare
> as signaled. Just using preincrement is exactly like starting at one.
> I don't see the need for this change but if it makes you happy.

Not exactly, the last emitted sequence is also used in 
radeon_fence_wait_empty. So when you use post increment 
radeon_fence_wait_empty will actually not wait for the last emitted 
fence to be signaled, but for last emitted + 1, so it practically waits 
forever.

Without this change suspend (for example) will just lockup.

Cheers,
Christian.

>
> Cheers,
> Jerome
>> Signed-off-by: Jerome Glisse<jglisse@redhat.com>
>> Signed-off-by: Christian König<deathsimple@vodafone.de>
>> ---
>>   drivers/gpu/drm/radeon/radeon.h       |   39 ++++++-----
>>   drivers/gpu/drm/radeon/radeon_fence.c |  116 +++++++++++++++++++++++----------
>>   drivers/gpu/drm/radeon/radeon_ring.c  |    9 ++-
>>   3 files changed, 107 insertions(+), 57 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
>> index e99ea81..cdf46bc 100644
>> --- a/drivers/gpu/drm/radeon/radeon.h
>> +++ b/drivers/gpu/drm/radeon/radeon.h
>> @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout;
>>   * Copy from radeon_drv.h so we don't have to include both and have conflicting
>>   * symbol;
>>   */
>> -#define RADEON_MAX_USEC_TIMEOUT                100000  /* 100 ms */
>> -#define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
>> +#define RADEON_MAX_USEC_TIMEOUT                        100000  /* 100 ms */
>> +#define RADEON_FENCE_JIFFIES_TIMEOUT           (HZ / 2)
>>   /* RADEON_IB_POOL_SIZE must be a power of 2 */
>> -#define RADEON_IB_POOL_SIZE            16
>> -#define RADEON_DEBUGFS_MAX_COMPONENTS  32
>> -#define RADEONFB_CONN_LIMIT            4
>> -#define RADEON_BIOS_NUM_SCRATCH                8
>> +#define RADEON_IB_POOL_SIZE                    16
>> +#define RADEON_DEBUGFS_MAX_COMPONENTS          32
>> +#define RADEONFB_CONN_LIMIT                    4
>> +#define RADEON_BIOS_NUM_SCRATCH                        8
>>
>>   /* max number of rings */
>> -#define RADEON_NUM_RINGS 3
>> +#define RADEON_NUM_RINGS                       3
>> +
>> +/* fence seq are set to this number when signaled */
>> +#define RADEON_FENCE_SIGNALED_SEQ              0LL
>> +#define RADEON_FENCE_NOTEMITED_SEQ             (~0LL)
>>
>>   /* internal ring indices */
>>   /* r1xx+ has gfx CP ring */
>> -#define RADEON_RING_TYPE_GFX_INDEX  0
>> +#define RADEON_RING_TYPE_GFX_INDEX             0
>>
>>   /* cayman has 2 compute CP rings */
>> -#define CAYMAN_RING_TYPE_CP1_INDEX 1
>> -#define CAYMAN_RING_TYPE_CP2_INDEX 2
>> +#define CAYMAN_RING_TYPE_CP1_INDEX             1
>> +#define CAYMAN_RING_TYPE_CP2_INDEX             2
>>
>>   /* hardcode those limit for now */
>> -#define RADEON_VA_RESERVED_SIZE                (8<<  20)
>> -#define RADEON_IB_VM_MAX_SIZE          (64<<  10)
>> +#define RADEON_VA_RESERVED_SIZE                        (8<<  20)
>> +#define RADEON_IB_VM_MAX_SIZE                  (64<<  10)
>>
>>   /*
>>   * Errata workarounds.
>> @@ -254,8 +258,9 @@ struct radeon_fence_driver {
>>         uint32_t                        scratch_reg;
>>         uint64_t                        gpu_addr;
>>         volatile uint32_t               *cpu_addr;
>> -       atomic_t                        seq;
>> -       uint32_t                        last_seq;
>> +       /* seq is protected by ring emission lock */
>> +       uint64_t                        seq;
>> +       atomic64_t                      last_seq;
>>         unsigned long                   last_activity;
>>         wait_queue_head_t               queue;
>>         struct list_head                emitted;
>> @@ -268,11 +273,9 @@ struct radeon_fence {
>>         struct kref                     kref;
>>         struct list_head                list;
>>         /* protected by radeon_fence.lock */
>> -       uint32_t                        seq;
>> -       bool                            emitted;
>> -       bool                            signaled;
>> +       uint64_t                        seq;
>>         /* RB, DMA, etc. */
>> -       int                             ring;
>> +       unsigned                        ring;
>>         struct radeon_semaphore         *semaphore;
>>   };
>>
>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
>> index 5bb78bf..feb2bbc 100644
>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>> @@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev, struct radeon_fence *fence)
>>         unsigned long irq_flags;
>>
>>         write_lock_irqsave(&rdev->fence_lock, irq_flags);
>> -       if (fence->emitted) {
>> +       if (fence->seq&&  fence->seq<  RADEON_FENCE_NOTEMITED_SEQ) {
>>                 write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>                 return 0;
>>         }
>> -       fence->seq = atomic_add_return(1,&rdev->fence_drv[fence->ring].seq);
>> +       /* we are protected by the ring emission mutex */
>> +       fence->seq = ++rdev->fence_drv[fence->ring].seq;
>>         radeon_fence_ring_emit(rdev, fence->ring, fence);
>>         trace_radeon_fence_emit(rdev->ddev, fence->seq);
>> -       fence->emitted = true;
>>         /* are we the first fence on a previusly idle ring? */
>>         if (list_empty(&rdev->fence_drv[fence->ring].emitted)) {
>>                 rdev->fence_drv[fence->ring].last_activity = jiffies;
>> @@ -87,14 +87,60 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
>>   {
>>         struct radeon_fence *fence;
>>         struct list_head *i, *n;
>> -       uint32_t seq;
>> +       uint64_t seq, last_seq;
>> +       unsigned count_loop = 0;
>>         bool wake = false;
>>
>> -       seq = radeon_fence_read(rdev, ring);
>> -       if (seq == rdev->fence_drv[ring].last_seq)
>> -               return false;
>> +       /* Note there is a scenario here for an infinite loop but it's
>> +        * very unlikely to happen. For it to happen, the current polling
>> +        * process need to be interrupted by another process and another
>> +        * process needs to update the last_seq btw the atomic read and
>> +        * xchg of the current process.
>> +        *
>> +        * More over for this to go in infinite loop there need to be
>> +        * continuously new fence signaled ie radeon_fence_read needs
>> +        * to return a different value each time for both the currently
>> +        * polling process and the other process that xchg the last_seq
>> +        * btw atomic read and xchg of the current process. And the
>> +        * value the other process set as last seq must be higher than
>> +        * the seq value we just read. Which means that current process
>> +        * need to be interrupted after radeon_fence_read and before
>> +        * atomic xchg.
>> +        *
>> +        * To be even more safe we count the number of time we loop and
>> +        * we bail after 10 loop just accepting the fact that we might
>> +        * have temporarly set the last_seq not to the true real last
>> +        * seq but to an older one.
>> +        */
>> +       last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>> +       do {
>> +               seq = radeon_fence_read(rdev, ring);
>> +               seq |= last_seq&  0xffffffff00000000LL;
>> +               if (seq<  last_seq) {
>> +                       seq += 0x100000000LL;
>> +               }
>>
>> -       rdev->fence_drv[ring].last_seq = seq;
>> +               if (!wake&&  seq == last_seq) {
>> +                       return false;
>> +               }
>> +               /* If we loop over we don't want to return without
>> +                * checking if a fence is signaled as it means that the
>> +                * seq we just read is different from the previous on.
>> +                */
>> +               wake = true;
>> +               if ((count_loop++)>  10) {
>> +                       /* We looped over too many time leave with the
>> +                        * fact that we might have set an older fence
>> +                        * seq then the current real last seq as signaled
>> +                        * by the hw.
>> +                        */
>> +                       break;
>> +               }
>> +               last_seq = seq;
>> +       } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq)>  seq);
>> +
>> +       /* reset wake to false */
>> +       wake = false;
>>         rdev->fence_drv[ring].last_activity = jiffies;
>>
>>         n = NULL;
>> @@ -112,7 +158,7 @@ static bool radeon_fence_poll_locked(struct radeon_device *rdev, int ring)
>>                         n = i->prev;
>>                         list_move_tail(i,&rdev->fence_drv[ring].signaled);
>>                         fence = list_entry(i, struct radeon_fence, list);
>> -                       fence->signaled = true;
>> +                       fence->seq = RADEON_FENCE_SIGNALED_SEQ;
>>                         i = n;
>>                 } while (i !=&rdev->fence_drv[ring].emitted);
>>                 wake = true;
>> @@ -128,7 +174,7 @@ static void radeon_fence_destroy(struct kref *kref)
>>         fence = container_of(kref, struct radeon_fence, kref);
>>         write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
>>         list_del(&fence->list);
>> -       fence->emitted = false;
>> +       fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
>>         write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
>>         if (fence->semaphore)
>>                 radeon_semaphore_free(fence->rdev, fence->semaphore);
>> @@ -145,9 +191,7 @@ int radeon_fence_create(struct radeon_device *rdev,
>>         }
>>         kref_init(&((*fence)->kref));
>>         (*fence)->rdev = rdev;
>> -       (*fence)->emitted = false;
>> -       (*fence)->signaled = false;
>> -       (*fence)->seq = 0;
>> +       (*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ;
>>         (*fence)->ring = ring;
>>         (*fence)->semaphore = NULL;
>>         INIT_LIST_HEAD(&(*fence)->list);
>> @@ -163,18 +207,18 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
>>                 return true;
>>
>>         write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
>> -       signaled = fence->signaled;
>> +       signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
>>         /* if we are shuting down report all fence as signaled */
>>         if (fence->rdev->shutdown) {
>>                 signaled = true;
>>         }
>> -       if (!fence->emitted) {
>> +       if (fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
>>                 WARN(1, "Querying an unemitted fence : %p !\n", fence);
>>                 signaled = true;
>>         }
>>         if (!signaled) {
>>                 radeon_fence_poll_locked(fence->rdev, fence->ring);
>> -               signaled = fence->signaled;
>> +               signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
>>         }
>>         write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
>>         return signaled;
>> @@ -183,8 +227,8 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
>>   int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>>   {
>>         struct radeon_device *rdev;
>> -       unsigned long irq_flags, timeout;
>> -       u32 seq;
>> +       unsigned long irq_flags, timeout, last_activity;
>> +       uint64_t seq;
>>         int i, r;
>>         bool signaled;
>>
>> @@ -207,7 +251,9 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>>                         timeout = 1;
>>                 }
>>                 /* save current sequence value used to check for GPU lockups */
>> -               seq = rdev->fence_drv[fence->ring].last_seq;
>> +               seq = atomic64_read(&rdev->fence_drv[fence->ring].last_seq);
>> +               /* Save current last activity valuee, used to check for GPU lockups */
>> +               last_activity = rdev->fence_drv[fence->ring].last_activity;
>>                 read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>
>>                 trace_radeon_fence_wait_begin(rdev->ddev, seq);
>> @@ -235,24 +281,23 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>>                         }
>>
>>                         write_lock_irqsave(&rdev->fence_lock, irq_flags);
>> -                       /* check if sequence value has changed since last_activity */
>> -                       if (seq != rdev->fence_drv[fence->ring].last_seq) {
>> +                       /* test if somebody else has already decided that this is a lockup */
>> +                       if (last_activity != rdev->fence_drv[fence->ring].last_activity) {
>>                                 write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>                                 continue;
>>                         }
>>
>> -                       /* change sequence value on all rings, so nobody else things there is a lockup */
>> -                       for (i = 0; i<  RADEON_NUM_RINGS; ++i)
>> -                               rdev->fence_drv[i].last_seq -= 0x10000;
>> -
>> -                       rdev->fence_drv[fence->ring].last_activity = jiffies;
>>                         write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>
>>                         if (radeon_ring_is_lockup(rdev, fence->ring,&rdev->ring[fence->ring])) {
>> -
>>                                 /* good news we believe it's a lockup */
>> -                               printk(KERN_WARNING "GPU lockup (waiting for 0x%08X last fence id 0x%08X)\n",
>> -                                    fence->seq, seq);
>> +                               dev_warn(rdev->dev, "GPU lockup (waiting for 0x%016llx last fence id 0x%016llx)\n",
>> +                                        fence->seq, seq);
>> +
>> +                               /* change last activity so nobody else think there is a lockup */
>> +                               for (i = 0; i<  RADEON_NUM_RINGS; ++i) {
>> +                                       rdev->fence_drv[i].last_activity = jiffies;
>> +                               }
>>
>>                                 /* mark the ring as not ready any more */
>>                                 rdev->ring[fence->ring].ready = false;
>> @@ -387,9 +432,9 @@ int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring)
>>         }
>>         rdev->fence_drv[ring].cpu_addr =&rdev->wb.wb[index/4];
>>         rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
>> -       radeon_fence_write(rdev, atomic_read(&rdev->fence_drv[ring]..seq), ring);
>> +       radeon_fence_write(rdev, rdev->fence_drv[ring].seq, ring);
>>         rdev->fence_drv[ring].initialized = true;
>> -       DRM_INFO("fence driver on ring %d use gpu addr 0x%08Lx and cpu addr 0x%p\n",
>> +       DRM_INFO("fence driver on ring %d use gpu addr 0x%016llx and cpu addr 0x%p\n",
>>                  ring, rdev->fence_drv[ring].gpu_addr, rdev->fence_drv[ring].cpu_addr);
>>         write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>         return 0;
>> @@ -400,7 +445,8 @@ static void radeon_fence_driver_init_ring(struct radeon_device *rdev, int ring)
>>         rdev->fence_drv[ring].scratch_reg = -1;
>>         rdev->fence_drv[ring].cpu_addr = NULL;
>>         rdev->fence_drv[ring].gpu_addr = 0;
>> -       atomic_set(&rdev->fence_drv[ring].seq, 0);
>> +       rdev->fence_drv[ring].seq = 0;
>> +       atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
>>         INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
>>         INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
>>         init_waitqueue_head(&rdev->fence_drv[ring].queue);
>> @@ -458,12 +504,12 @@ static int radeon_debugfs_fence_info(struct seq_file *m, void *data)
>>                         continue;
>>
>>                 seq_printf(m, "--- ring %d ---\n", i);
>> -               seq_printf(m, "Last signaled fence 0x%08X\n",
>> -                          radeon_fence_read(rdev, i));
>> +               seq_printf(m, "Last signaled fence 0x%016lx\n",
>> +                          atomic64_read(&rdev->fence_drv[i].last_seq));
>>                 if (!list_empty(&rdev->fence_drv[i].emitted)) {
>>                         fence = list_entry(rdev->fence_drv[i].emitted.prev,
>>                                            struct radeon_fence, list);
>> -                       seq_printf(m, "Last emitted fence %p with 0x%08X\n",
>> +                       seq_printf(m, "Last emitted fence %p with 0x%016llx\n",
>>                                    fence,  fence->seq);
>>                 }
>>         }
>> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
>> index a4d60ae..4ae222b 100644
>> --- a/drivers/gpu/drm/radeon/radeon_ring.c
>> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
>> @@ -82,7 +82,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev, struct radeon_ib *ib)
>>         bool done = false;
>>
>>         /* only free ib which have been emited */
>> -       if (ib->fence&&  ib->fence->emitted) {
>> +       if (ib->fence&&  ib->fence->seq<  RADEON_FENCE_NOTEMITED_SEQ) {
>>                 if (radeon_fence_signaled(ib->fence)) {
>>                         radeon_fence_unref(&ib->fence);
>>                         radeon_sa_bo_free(rdev,&ib->sa_bo);
>> @@ -149,8 +149,9 @@ retry:
>>         /* this should be rare event, ie all ib scheduled none signaled yet.
>>          */
>>         for (i = 0; i<  RADEON_IB_POOL_SIZE; i++) {
>> -               if (rdev->ib_pool.ibs[idx].fence&&  rdev->ib_pool.ibs[idx].fence->emitted) {
>> -                       r = radeon_fence_wait(rdev->ib_pool.ibs[idx].fence, false);
>> +               struct radeon_fence *fence = rdev->ib_pool.ibs[idx].fence;
>> +               if (fence&&  fence->seq<  RADEON_FENCE_NOTEMITED_SEQ) {
>> +                       r = radeon_fence_wait(fence, false);
>>                         if (!r) {
>>                                 goto retry;
>>                         }
>> @@ -173,7 +174,7 @@ void radeon_ib_free(struct radeon_device *rdev, struct radeon_ib **ib)
>>                 return;
>>         }
>>         radeon_mutex_lock(&rdev->ib_pool.mutex);
>> -       if (tmp->fence&&  !tmp->fence->emitted) {
>> +       if (tmp->fence&&  tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
>>                 radeon_sa_bo_free(rdev,&tmp->sa_bo);
>>                 radeon_fence_unref(&tmp->fence);
>>         }
>> --
>> 1.7.5.4
>>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-07 11:42 ` [PATCH 14/20] drm/radeon: multiple ring allocator v2 Christian König
@ 2012-05-07 15:23   ` Jerome Glisse
  2012-05-07 16:45     ` Christian König
  0 siblings, 1 reply; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 15:23 UTC (permalink / raw)
  To: Christian König; +Cc: Jerome Glisse, dri-devel

On Mon, May 7, 2012 at 7:42 AM, Christian König <deathsimple@vodafone.de> wrote:
> A startover with a new idea for a multiple ring allocator.
> Should perform as well as a normal ring allocator as long
> as only one ring does somthing, but falls back to a more
> complex algorithm if more complex things start to happen.
>
> We store the last allocated bo in last, we always try to allocate
> after the last allocated bo. Principle is that in a linear GPU ring
> progression was is after last is the oldest bo we allocated and thus
> the first one that should no longer be in use by the GPU.
>
> If it's not the case we skip over the bo after last to the closest
> done bo if such one exist. If none exist and we are not asked to
> block we report failure to allocate.
>
> If we are asked to block we wait on all the oldest fence of all
> rings. We just wait for any of those fence to complete.
>
> v2: We need to be able to let hole point to the list_head, otherwise
>    try free will never free the first allocation of the list. Also
>    stop calling radeon_fence_signalled more than necessary.
>
> Signed-off-by: Christian König <deathsimple@vodafone.de>
> Signed-off-by: Jerome Glisse <jglisse@redhat.com>

This one is NAK please use my patch. Yes in my patch we never try to
free anything if there is only on sa_bo in the list if you really care
about this it's a one line change:
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch


Your patch here can enter in infinite loop and never return holding
the lock. See below.

Cheers,
Jerome

> ---
>  drivers/gpu/drm/radeon/radeon.h      |    7 +-
>  drivers/gpu/drm/radeon/radeon_ring.c |   19 +--
>  drivers/gpu/drm/radeon/radeon_sa.c   |  292 +++++++++++++++++++++++-----------
>  3 files changed, 210 insertions(+), 108 deletions(-)
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 37a7459..cc7f16a 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -385,7 +385,9 @@ struct radeon_bo_list {
>  struct radeon_sa_manager {
>        spinlock_t              lock;
>        struct radeon_bo        *bo;
> -       struct list_head        sa_bo;
> +       struct list_head        *hole;
> +       struct list_head        flist[RADEON_NUM_RINGS];
> +       struct list_head        olist;
>        unsigned                size;
>        uint64_t                gpu_addr;
>        void                    *cpu_ptr;
> @@ -396,7 +398,8 @@ struct radeon_sa_bo;
>
>  /* sub-allocation buffer */
>  struct radeon_sa_bo {
> -       struct list_head                list;
> +       struct list_head                olist;
> +       struct list_head                flist;
>        struct radeon_sa_manager        *manager;
>        unsigned                        soffset;
>        unsigned                        eoffset;
> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c
> index 1748d93..e074ff5 100644
> --- a/drivers/gpu/drm/radeon/radeon_ring.c
> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
> @@ -204,25 +204,22 @@ int radeon_ib_schedule(struct radeon_device *rdev, struct radeon_ib *ib)
>
>  int radeon_ib_pool_init(struct radeon_device *rdev)
>  {
> -       struct radeon_sa_manager tmp;
>        int i, r;
>
> -       r = radeon_sa_bo_manager_init(rdev, &tmp,
> -                                     RADEON_IB_POOL_SIZE*64*1024,
> -                                     RADEON_GEM_DOMAIN_GTT);
> -       if (r) {
> -               return r;
> -       }
> -
>        radeon_mutex_lock(&rdev->ib_pool.mutex);
>        if (rdev->ib_pool.ready) {
>                radeon_mutex_unlock(&rdev->ib_pool.mutex);
> -               radeon_sa_bo_manager_fini(rdev, &tmp);
>                return 0;
>        }
>
> -       rdev->ib_pool.sa_manager = tmp;
> -       INIT_LIST_HEAD(&rdev->ib_pool.sa_manager.sa_bo);
> +       r = radeon_sa_bo_manager_init(rdev, &rdev->ib_pool.sa_manager,
> +                                     RADEON_IB_POOL_SIZE*64*1024,
> +                                     RADEON_GEM_DOMAIN_GTT);
> +       if (r) {
> +               radeon_mutex_unlock(&rdev->ib_pool.mutex);
> +               return r;
> +       }
> +
>        for (i = 0; i < RADEON_IB_POOL_SIZE; i++) {
>                rdev->ib_pool.ibs[i].fence = NULL;
>                rdev->ib_pool.ibs[i].idx = i;
> diff --git a/drivers/gpu/drm/radeon/radeon_sa.c b/drivers/gpu/drm/radeon/radeon_sa.c
> index 90ee8ad..757a9d4 100644
> --- a/drivers/gpu/drm/radeon/radeon_sa.c
> +++ b/drivers/gpu/drm/radeon/radeon_sa.c
> @@ -27,21 +27,42 @@
>  * Authors:
>  *    Jerome Glisse <glisse@freedesktop.org>
>  */
> +/* Algorithm:
> + *
> + * We store the last allocated bo in "hole", we always try to allocate
> + * after the last allocated bo. Principle is that in a linear GPU ring
> + * progression was is after last is the oldest bo we allocated and thus
> + * the first one that should no longer be in use by the GPU.
> + *
> + * If it's not the case we skip over the bo after last to the closest
> + * done bo if such one exist. If none exist and we are not asked to
> + * block we report failure to allocate.
> + *
> + * If we are asked to block we wait on all the oldest fence of all
> + * rings. We just wait for any of those fence to complete.
> + */
>  #include "drmP.h"
>  #include "drm.h"
>  #include "radeon.h"
>
> +static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo);
> +static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager);
> +
>  int radeon_sa_bo_manager_init(struct radeon_device *rdev,
>                              struct radeon_sa_manager *sa_manager,
>                              unsigned size, u32 domain)
>  {
> -       int r;
> +       int i, r;
>
>        spin_lock_init(&sa_manager->lock);
>        sa_manager->bo = NULL;
>        sa_manager->size = size;
>        sa_manager->domain = domain;
> -       INIT_LIST_HEAD(&sa_manager->sa_bo);
> +       sa_manager->hole = &sa_manager->olist;
> +       INIT_LIST_HEAD(&sa_manager->olist);
> +       for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> +               INIT_LIST_HEAD(&sa_manager->flist[i]);
> +       }
>
>        r = radeon_bo_create(rdev, size, RADEON_GPU_PAGE_SIZE, true,
>                             RADEON_GEM_DOMAIN_CPU, &sa_manager->bo);
> @@ -58,11 +79,15 @@ void radeon_sa_bo_manager_fini(struct radeon_device *rdev,
>  {
>        struct radeon_sa_bo *sa_bo, *tmp;
>
> -       if (!list_empty(&sa_manager->sa_bo)) {
> -               dev_err(rdev->dev, "sa_manager is not empty, clearing anyway\n");
> +       if (!list_empty(&sa_manager->olist)) {
> +               sa_manager->hole = &sa_manager->olist,
> +               radeon_sa_bo_try_free(sa_manager);
> +               if (!list_empty(&sa_manager->olist)) {
> +                       dev_err(rdev->dev, "sa_manager is not empty, clearing anyway\n");
> +               }
>        }
> -       list_for_each_entry_safe(sa_bo, tmp, &sa_manager->sa_bo, list) {
> -               list_del_init(&sa_bo->list);
> +       list_for_each_entry_safe(sa_bo, tmp, &sa_manager->olist, olist) {
> +               radeon_sa_bo_remove_locked(sa_bo);
>        }
>        radeon_bo_unref(&sa_manager->bo);
>        sa_manager->size = 0;
> @@ -114,111 +139,181 @@ int radeon_sa_bo_manager_suspend(struct radeon_device *rdev,
>        return r;
>  }
>
> -/*
> - * Principe is simple, we keep a list of sub allocation in offset
> - * order (first entry has offset == 0, last entry has the highest
> - * offset).
> - *
> - * When allocating new object we first check if there is room at
> - * the end total_size - (last_object_offset + last_object_size) >=
> - * alloc_size. If so we allocate new object there.
> - *
> - * When there is not enough room at the end, we start waiting for
> - * each sub object until we reach object_offset+object_size >=
> - * alloc_size, this object then become the sub object we return.
> - *
> - * Alignment can't be bigger than page size
> - */
> -
>  static void radeon_sa_bo_remove_locked(struct radeon_sa_bo *sa_bo)
>  {
> -       list_del(&sa_bo->list);
> +       struct radeon_sa_manager *sa_manager = sa_bo->manager;
> +       if (sa_manager->hole == &sa_bo->olist) {
> +               sa_manager->hole = sa_bo->olist.prev;
> +       }
> +       list_del_init(&sa_bo->olist);
> +       list_del_init(&sa_bo->flist);
>        radeon_fence_unref(&sa_bo->fence);
>        kfree(sa_bo);
>  }
>
> +static void radeon_sa_bo_try_free(struct radeon_sa_manager *sa_manager)
> +{
> +       struct radeon_sa_bo *sa_bo, *tmp;
> +
> +       if (sa_manager->hole->next == &sa_manager->olist)
> +               return;
> +
> +       sa_bo = list_entry(sa_manager->hole->next, struct radeon_sa_bo, olist);
> +       list_for_each_entry_safe_from(sa_bo, tmp, &sa_manager->olist, olist) {
> +               if (sa_bo->fence == NULL || !radeon_fence_signaled(sa_bo->fence)) {
> +                       return;
> +               }
> +               radeon_sa_bo_remove_locked(sa_bo);
> +       }
> +}
> +
> +static inline unsigned radeon_sa_bo_hole_soffset(struct radeon_sa_manager *sa_manager)
> +{
> +       struct list_head *hole = sa_manager->hole;
> +
> +       if (hole != &sa_manager->olist) {
> +               return list_entry(hole, struct radeon_sa_bo, olist)->eoffset;
> +       }
> +       return 0;
> +}
> +
> +static inline unsigned radeon_sa_bo_hole_eoffset(struct radeon_sa_manager *sa_manager)
> +{
> +       struct list_head *hole = sa_manager->hole;
> +
> +       if (hole->next != &sa_manager->olist) {
> +               return list_entry(hole->next, struct radeon_sa_bo, olist)->soffset;
> +       }
> +       return sa_manager->size;
> +}
> +
> +static bool radeon_sa_bo_try_alloc(struct radeon_sa_manager *sa_manager,
> +                                  struct radeon_sa_bo *sa_bo,
> +                                  unsigned size, unsigned align)
> +{
> +       unsigned soffset, eoffset, wasted;
> +
> +       soffset = radeon_sa_bo_hole_soffset(sa_manager);
> +       eoffset = radeon_sa_bo_hole_eoffset(sa_manager);
> +       wasted = (align - (soffset % align)) % align;
> +
> +       if ((eoffset - soffset) >= (size + wasted)) {
> +               soffset += wasted;
> +
> +               sa_bo->manager = sa_manager;
> +               sa_bo->soffset = soffset;
> +               sa_bo->eoffset = soffset + size;
> +               list_add(&sa_bo->olist, sa_manager->hole);
> +               INIT_LIST_HEAD(&sa_bo->flist);
> +               sa_manager->hole = &sa_bo->olist;
> +               return true;
> +       }
> +       return false;
> +}
> +
> +static bool radeon_sa_bo_next_hole(struct radeon_sa_manager *sa_manager,
> +                                  struct radeon_fence **fences)
> +{
> +       unsigned i, soffset, best, tmp;
> +
> +       /* if hole points to the end of the buffer */
> +       if (sa_manager->hole->next == &sa_manager->olist) {
> +               /* try again with its beginning */
> +               sa_manager->hole = &sa_manager->olist;
> +               return true;
> +       }
> +
> +       soffset = radeon_sa_bo_hole_soffset(sa_manager);
> +       /* to handle wrap around we add sa_manager->size */
> +       best = sa_manager->size * 2;
> +       /* go over all fence list and try to find the closest sa_bo
> +        * of the current last
> +        */
> +       for (i = 0; i < RADEON_NUM_RINGS; ++i) {
> +               struct radeon_sa_bo *sa_bo;
> +
> +               if (list_empty(&sa_manager->flist[i])) {
> +                       fences[i] = NULL;
> +                       continue;
> +               }
> +
> +               sa_bo = list_first_entry(&sa_manager->flist[i],
> +                                        struct radeon_sa_bo, flist);
> +
> +               if (!radeon_fence_signaled(sa_bo->fence)) {
> +                       fences[i] = sa_bo->fence;
> +                       continue;
> +               }
> +
> +               tmp = sa_bo->soffset;
> +               if (tmp < soffset) {
> +                       /* wrap around, pretend it's after */
> +                       tmp += sa_manager->size;
> +               }
> +               tmp -= soffset;
> +               if (tmp < best) {
> +                       /* this sa bo is the closest one */
> +                       best = tmp;
> +                       sa_manager->hole = sa_bo->olist.prev;
> +               }
> +
> +               /* we knew that this one is signaled,
> +                  so it's save to remote it */
> +               radeon_sa_bo_remove_locked(sa_bo);
> +       }
> +       return best != sa_manager->size * 2;
> +}
> +
>  int radeon_sa_bo_new(struct radeon_device *rdev,
>                     struct radeon_sa_manager *sa_manager,
>                     struct radeon_sa_bo **sa_bo,
>                     unsigned size, unsigned align, bool block)
>  {
> -       struct radeon_fence *fence = NULL;
> -       struct radeon_sa_bo *tmp, *next;
> -       struct list_head *head;
> -       unsigned offset = 0, wasted = 0;
> -       int r;
> +       struct radeon_fence *fences[RADEON_NUM_RINGS];
> +       int r = -ENOMEM;
>
>        BUG_ON(align > RADEON_GPU_PAGE_SIZE);
>        BUG_ON(size > sa_manager->size);
>
>        *sa_bo = kmalloc(sizeof(struct radeon_sa_bo), GFP_KERNEL);
> -
> -retry:
> +       if ((*sa_bo) == NULL) {
> +               return -ENOMEM;
> +       }
> +       (*sa_bo)->manager = sa_manager;
> +       (*sa_bo)->fence = NULL;
> +       INIT_LIST_HEAD(&(*sa_bo)->olist);
> +       INIT_LIST_HEAD(&(*sa_bo)->flist);
>
>        spin_lock(&sa_manager->lock);
> +       do {
> +               /* try to allocate couple time before going to wait */
> +               do {
> +                       radeon_sa_bo_try_free(sa_manager);
>
> -       /* no one ? */
> -       head = sa_manager->sa_bo.prev;
> -       if (list_empty(&sa_manager->sa_bo)) {
> -               goto out;
> -       }
> -
> -       /* look for a hole big enough */
> -       offset = 0;
> -       list_for_each_entry_safe(tmp, next, &sa_manager->sa_bo, list) {
> -               /* try to free this object */
> -               if (tmp->fence) {
> -                       if (radeon_fence_signaled(tmp->fence)) {
> -                               radeon_sa_bo_remove_locked(tmp);
> -                               continue;
> -                       } else {
> -                               fence = tmp->fence;
> +                       if (radeon_sa_bo_try_alloc(sa_manager, *sa_bo,
> +                                                  size, align)) {
> +                               spin_unlock(&sa_manager->lock);
> +                               return 0;
>                        }
> -               }
>
> -               /* room before this object ? */
> -               if (offset < tmp->soffset && (tmp->soffset - offset) >= size) {
> -                       head = tmp->list.prev;
> -                       goto out;
> -               }
> -               offset = tmp->eoffset;
> -               wasted = offset % align;
> -               if (wasted) {
> -                       wasted = align - wasted;
> -               }
> -               offset += wasted;
> -       }
> -       /* room at the end ? */
> -       head = sa_manager->sa_bo.prev;
> -       tmp = list_entry(head, struct radeon_sa_bo, list);
> -       offset = tmp->eoffset;
> -       wasted = offset % align;
> -       if (wasted) {
> -               wasted = align - wasted;
> -       }
> -       offset += wasted;
> -       if ((sa_manager->size - offset) < size) {
> -               /* failed to find somethings big enough */
> -               spin_unlock(&sa_manager->lock);
> -               if (block && fence) {
> -                       r = radeon_fence_wait(fence, false);
> -                       if (r)
> -                               return r;
> -
> -                       goto retry;
> +                       /* see if we can skip over some allocations */
> +               } while (radeon_sa_bo_next_hole(sa_manager, fences));

Here you can infinite loop, in the case there is a bunch of hole in
the allocator but none of them allow to full fill the allocation.
radeon_sa_bo_next_hole will keep returning true looping over and over
on all the all. That's why i only restrict my patch to 2 hole skeeping
and then fails the allocation or try to wait. I believe sadly we need
an heuristic and 2 hole skeeping at most sounded like a good one.

> +
> +               if (block) {
> +                       spin_unlock(&sa_manager->lock);
> +                       r = radeon_fence_wait_any(rdev, fences, false);
> +                       spin_lock(&sa_manager->lock);
> +                       if (r) {
> +                               goto out_err;
> +                       }
>                }
> -               kfree(*sa_bo);
> -               *sa_bo = NULL;
> -               return -ENOMEM;
> -       }
> +       } while (block);
>
> -out:
> -       (*sa_bo)->manager = sa_manager;
> -       (*sa_bo)->soffset = offset;
> -       (*sa_bo)->eoffset = offset + size;
> -       list_add(&(*sa_bo)->list, head);
> +out_err:
>        spin_unlock(&sa_manager->lock);
> -       return 0;
> +       kfree(*sa_bo);
> +       *sa_bo = NULL;
> +       return r;
>  }
>
>  void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo,
> @@ -226,13 +321,16 @@ void radeon_sa_bo_free(struct radeon_device *rdev, struct radeon_sa_bo **sa_bo,
>  {
>        struct radeon_sa_manager *sa_manager;
>
> -       if (!sa_bo || !*sa_bo)
> +       if (sa_bo == NULL || *sa_bo == NULL) {
>                return;
> +       }
>
>        sa_manager = (*sa_bo)->manager;
>        spin_lock(&sa_manager->lock);
>        if (fence && fence->seq && fence->seq < RADEON_FENCE_NOTEMITED_SEQ) {
>                (*sa_bo)->fence = radeon_fence_ref(fence);
> +               list_add_tail(&(*sa_bo)->flist,
> +                             &sa_manager->flist[fence->ring]);
>        } else {
>                radeon_sa_bo_remove_locked(*sa_bo);
>        }
> @@ -247,15 +345,19 @@ void radeon_sa_bo_dump_debug_info(struct radeon_sa_manager *sa_manager,
>        struct radeon_sa_bo *i;
>
>        spin_lock(&sa_manager->lock);
> -       list_for_each_entry(i, &sa_manager->sa_bo, list) {
> -               seq_printf(m, "[%08x %08x] size %4d (%p)",
> -                          i->soffset, i->eoffset, i->eoffset - i->soffset, i);
> -               if (i->fence) {
> -                       seq_printf(m, " protected by %Ld (%p) on ring %d\n",
> -                                  i->fence->seq, i->fence, i->fence->ring);
> +       list_for_each_entry(i, &sa_manager->olist, olist) {
> +               if (&i->olist == sa_manager->hole) {
> +                       seq_printf(m, ">");
>                } else {
> -                       seq_printf(m, "\n");
> +                       seq_printf(m, " ");
> +               }
> +               seq_printf(m, "[0x%08x 0x%08x] size %8d",
> +                          i->soffset, i->eoffset, i->eoffset - i->soffset);
> +               if (i->fence) {
> +                       seq_printf(m, " protected by 0x%016llx on ring %d",
> +                                  i->fence->seq, i->fence->ring);
>                }
> +               seq_printf(m, "\n");
>        }
>        spin_unlock(&sa_manager->lock);
>  }
> --
> 1.7.5.4
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 04/20] drm/radeon: convert fence to uint64_t v4
  2012-05-07 15:04     ` Christian König
@ 2012-05-07 15:27       ` Jerome Glisse
  0 siblings, 0 replies; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 15:27 UTC (permalink / raw)
  To: Christian König; +Cc: Jerome Glisse, dri-devel

On Mon, May 7, 2012 at 11:04 AM, Christian König
<deathsimple@vodafone.de> wrote:
> On 07.05.2012 16:39, Jerome Glisse wrote:
>>
>> On Mon, May 7, 2012 at 7:42 AM, Christian König<deathsimple@vodafone.de>
>>  wrote:
>>>
>>> From: Jerome Glisse<jglisse@redhat.com>
>>>
>>> This convert fence to use uint64_t sequence number intention is
>>> to use the fact that uin64_t is big enough that we don't need to
>>> care about wrap around.
>>>
>>> Tested with and without writeback using 0xFFFFF000 as initial
>>> fence sequence and thus allowing to test the wrap around from
>>> 32bits to 64bits.
>>>
>>> v2: Add comment about possible race btw CPU&  GPU, add comment
>>>
>>>    stressing that we need 2 dword aligned for R600_WB_EVENT_OFFSET
>>>    Read fence sequenc in reverse order of GPU write them so we
>>>    mitigate the race btw CPU and GPU.
>>>
>>> v3: Drop the need for ring to emit the 64bits fence, and just have
>>>    each ring emit the lower 32bits of the fence sequence. We
>>>    handle the wrap over 32bits in fence_process.
>>>
>>> v4: Just a small optimization: Don't reread the last_seq value
>>>    if loop restarts, since we already know its value anyway.
>>>    Also start at zero not one for seq value and use pre instead
>>>    of post increment in emmit, otherwise wait_empty will deadlock.
>>
>> Why changing that v3 was already good no deadlock. I started at 1
>> especialy for that, a signaled fence is set to 0 so it always compare
>> as signaled. Just using preincrement is exactly like starting at one.
>> I don't see the need for this change but if it makes you happy.
>
>
> Not exactly, the last emitted sequence is also used in
> radeon_fence_wait_empty. So when you use post increment
> radeon_fence_wait_empty will actually not wait for the last emitted fence to
> be signaled, but for last emitted + 1, so it practically waits forever.
>
> Without this change suspend (for example) will just lockup.
>
> Cheers,
> Christian.

Yeah you right, my tree had a fix for that. I probably messed up the
rebase patch at one point. Well as your version fix it i am fine with
it.

Cheers,
Jerome

>>
>> Cheers,
>> Jerome
>>>
>>> Signed-off-by: Jerome Glisse<jglisse@redhat.com>
>>> Signed-off-by: Christian König<deathsimple@vodafone.de>
>>> ---
>>>  drivers/gpu/drm/radeon/radeon.h       |   39 ++++++-----
>>>  drivers/gpu/drm/radeon/radeon_fence.c |  116
>>> +++++++++++++++++++++++----------
>>>  drivers/gpu/drm/radeon/radeon_ring.c  |    9 ++-
>>>  3 files changed, 107 insertions(+), 57 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon.h
>>> b/drivers/gpu/drm/radeon/radeon.h
>>> index e99ea81..cdf46bc 100644
>>> --- a/drivers/gpu/drm/radeon/radeon.h
>>> +++ b/drivers/gpu/drm/radeon/radeon.h
>>> @@ -100,28 +100,32 @@ extern int radeon_lockup_timeout;
>>>  * Copy from radeon_drv.h so we don't have to include both and have
>>> conflicting
>>>  * symbol;
>>>  */
>>> -#define RADEON_MAX_USEC_TIMEOUT                100000  /* 100 ms */
>>> -#define RADEON_FENCE_JIFFIES_TIMEOUT   (HZ / 2)
>>> +#define RADEON_MAX_USEC_TIMEOUT                        100000  /* 100 ms
>>> */
>>> +#define RADEON_FENCE_JIFFIES_TIMEOUT           (HZ / 2)
>>>  /* RADEON_IB_POOL_SIZE must be a power of 2 */
>>> -#define RADEON_IB_POOL_SIZE            16
>>> -#define RADEON_DEBUGFS_MAX_COMPONENTS  32
>>> -#define RADEONFB_CONN_LIMIT            4
>>> -#define RADEON_BIOS_NUM_SCRATCH                8
>>> +#define RADEON_IB_POOL_SIZE                    16
>>> +#define RADEON_DEBUGFS_MAX_COMPONENTS          32
>>> +#define RADEONFB_CONN_LIMIT                    4
>>> +#define RADEON_BIOS_NUM_SCRATCH                        8
>>>
>>>  /* max number of rings */
>>> -#define RADEON_NUM_RINGS 3
>>> +#define RADEON_NUM_RINGS                       3
>>> +
>>> +/* fence seq are set to this number when signaled */
>>> +#define RADEON_FENCE_SIGNALED_SEQ              0LL
>>> +#define RADEON_FENCE_NOTEMITED_SEQ             (~0LL)
>>>
>>>  /* internal ring indices */
>>>  /* r1xx+ has gfx CP ring */
>>> -#define RADEON_RING_TYPE_GFX_INDEX  0
>>> +#define RADEON_RING_TYPE_GFX_INDEX             0
>>>
>>>  /* cayman has 2 compute CP rings */
>>> -#define CAYMAN_RING_TYPE_CP1_INDEX 1
>>> -#define CAYMAN_RING_TYPE_CP2_INDEX 2
>>> +#define CAYMAN_RING_TYPE_CP1_INDEX             1
>>> +#define CAYMAN_RING_TYPE_CP2_INDEX             2
>>>
>>>  /* hardcode those limit for now */
>>> -#define RADEON_VA_RESERVED_SIZE                (8<<  20)
>>> -#define RADEON_IB_VM_MAX_SIZE          (64<<  10)
>>> +#define RADEON_VA_RESERVED_SIZE                        (8<<  20)
>>> +#define RADEON_IB_VM_MAX_SIZE                  (64<<  10)
>>>
>>>  /*
>>>  * Errata workarounds.
>>> @@ -254,8 +258,9 @@ struct radeon_fence_driver {
>>>        uint32_t                        scratch_reg;
>>>        uint64_t                        gpu_addr;
>>>        volatile uint32_t               *cpu_addr;
>>> -       atomic_t                        seq;
>>> -       uint32_t                        last_seq;
>>> +       /* seq is protected by ring emission lock */
>>> +       uint64_t                        seq;
>>> +       atomic64_t                      last_seq;
>>>        unsigned long                   last_activity;
>>>        wait_queue_head_t               queue;
>>>        struct list_head                emitted;
>>> @@ -268,11 +273,9 @@ struct radeon_fence {
>>>        struct kref                     kref;
>>>        struct list_head                list;
>>>        /* protected by radeon_fence.lock */
>>> -       uint32_t                        seq;
>>> -       bool                            emitted;
>>> -       bool                            signaled;
>>> +       uint64_t                        seq;
>>>        /* RB, DMA, etc. */
>>> -       int                             ring;
>>> +       unsigned                        ring;
>>>        struct radeon_semaphore         *semaphore;
>>>  };
>>>
>>> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c
>>> b/drivers/gpu/drm/radeon/radeon_fence.c
>>> index 5bb78bf..feb2bbc 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_fence.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
>>> @@ -66,14 +66,14 @@ int radeon_fence_emit(struct radeon_device *rdev,
>>> struct radeon_fence *fence)
>>>        unsigned long irq_flags;
>>>
>>>        write_lock_irqsave(&rdev->fence_lock, irq_flags);
>>> -       if (fence->emitted) {
>>> +       if (fence->seq&&  fence->seq<  RADEON_FENCE_NOTEMITED_SEQ) {
>>>
>>>                write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>>                return 0;
>>>        }
>>> -       fence->seq =
>>> atomic_add_return(1,&rdev->fence_drv[fence->ring].seq);
>>>
>>> +       /* we are protected by the ring emission mutex */
>>> +       fence->seq = ++rdev->fence_drv[fence->ring].seq;
>>>        radeon_fence_ring_emit(rdev, fence->ring, fence);
>>>        trace_radeon_fence_emit(rdev->ddev, fence->seq);
>>> -       fence->emitted = true;
>>>        /* are we the first fence on a previusly idle ring? */
>>>        if (list_empty(&rdev->fence_drv[fence->ring].emitted)) {
>>>                rdev->fence_drv[fence->ring].last_activity = jiffies;
>>> @@ -87,14 +87,60 @@ static bool radeon_fence_poll_locked(struct
>>> radeon_device *rdev, int ring)
>>>  {
>>>        struct radeon_fence *fence;
>>>        struct list_head *i, *n;
>>> -       uint32_t seq;
>>> +       uint64_t seq, last_seq;
>>> +       unsigned count_loop = 0;
>>>        bool wake = false;
>>>
>>> -       seq = radeon_fence_read(rdev, ring);
>>> -       if (seq == rdev->fence_drv[ring].last_seq)
>>> -               return false;
>>> +       /* Note there is a scenario here for an infinite loop but it's
>>> +        * very unlikely to happen. For it to happen, the current polling
>>> +        * process need to be interrupted by another process and another
>>> +        * process needs to update the last_seq btw the atomic read and
>>> +        * xchg of the current process.
>>> +        *
>>> +        * More over for this to go in infinite loop there need to be
>>> +        * continuously new fence signaled ie radeon_fence_read needs
>>> +        * to return a different value each time for both the currently
>>> +        * polling process and the other process that xchg the last_seq
>>> +        * btw atomic read and xchg of the current process. And the
>>> +        * value the other process set as last seq must be higher than
>>> +        * the seq value we just read. Which means that current process
>>> +        * need to be interrupted after radeon_fence_read and before
>>> +        * atomic xchg.
>>> +        *
>>> +        * To be even more safe we count the number of time we loop and
>>> +        * we bail after 10 loop just accepting the fact that we might
>>> +        * have temporarly set the last_seq not to the true real last
>>> +        * seq but to an older one.
>>> +        */
>>> +       last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>>> +       do {
>>> +               seq = radeon_fence_read(rdev, ring);
>>> +               seq |= last_seq&  0xffffffff00000000LL;
>>>
>>> +               if (seq<  last_seq) {
>>> +                       seq += 0x100000000LL;
>>> +               }
>>>
>>> -       rdev->fence_drv[ring].last_seq = seq;
>>> +               if (!wake&&  seq == last_seq) {
>>>
>>> +                       return false;
>>> +               }
>>> +               /* If we loop over we don't want to return without
>>> +                * checking if a fence is signaled as it means that the
>>> +                * seq we just read is different from the previous on.
>>> +                */
>>> +               wake = true;
>>> +               if ((count_loop++)>  10) {
>>> +                       /* We looped over too many time leave with the
>>> +                        * fact that we might have set an older fence
>>> +                        * seq then the current real last seq as signaled
>>> +                        * by the hw.
>>> +                        */
>>> +                       break;
>>> +               }
>>> +               last_seq = seq;
>>> +       } while (atomic64_xchg(&rdev->fence_drv[ring].last_seq, seq)>
>>>  seq);
>>> +
>>> +       /* reset wake to false */
>>> +       wake = false;
>>>        rdev->fence_drv[ring].last_activity = jiffies;
>>>
>>>        n = NULL;
>>> @@ -112,7 +158,7 @@ static bool radeon_fence_poll_locked(struct
>>> radeon_device *rdev, int ring)
>>>                        n = i->prev;
>>>                        list_move_tail(i,&rdev->fence_drv[ring].signaled);
>>>
>>>                        fence = list_entry(i, struct radeon_fence, list);
>>> -                       fence->signaled = true;
>>> +                       fence->seq = RADEON_FENCE_SIGNALED_SEQ;
>>>                        i = n;
>>>                } while (i !=&rdev->fence_drv[ring].emitted);
>>>
>>>                wake = true;
>>> @@ -128,7 +174,7 @@ static void radeon_fence_destroy(struct kref *kref)
>>>        fence = container_of(kref, struct radeon_fence, kref);
>>>        write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
>>>        list_del(&fence->list);
>>> -       fence->emitted = false;
>>> +       fence->seq = RADEON_FENCE_NOTEMITED_SEQ;
>>>        write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
>>>        if (fence->semaphore)
>>>                radeon_semaphore_free(fence->rdev, fence->semaphore);
>>> @@ -145,9 +191,7 @@ int radeon_fence_create(struct radeon_device *rdev,
>>>        }
>>>        kref_init(&((*fence)->kref));
>>>        (*fence)->rdev = rdev;
>>> -       (*fence)->emitted = false;
>>> -       (*fence)->signaled = false;
>>> -       (*fence)->seq = 0;
>>> +       (*fence)->seq = RADEON_FENCE_NOTEMITED_SEQ;
>>>        (*fence)->ring = ring;
>>>        (*fence)->semaphore = NULL;
>>>        INIT_LIST_HEAD(&(*fence)->list);
>>> @@ -163,18 +207,18 @@ bool radeon_fence_signaled(struct radeon_fence
>>> *fence)
>>>                return true;
>>>
>>>        write_lock_irqsave(&fence->rdev->fence_lock, irq_flags);
>>> -       signaled = fence->signaled;
>>> +       signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
>>>        /* if we are shuting down report all fence as signaled */
>>>        if (fence->rdev->shutdown) {
>>>                signaled = true;
>>>        }
>>> -       if (!fence->emitted) {
>>> +       if (fence->seq == RADEON_FENCE_NOTEMITED_SEQ) {
>>>                WARN(1, "Querying an unemitted fence : %p !\n", fence);
>>>                signaled = true;
>>>        }
>>>        if (!signaled) {
>>>                radeon_fence_poll_locked(fence->rdev, fence->ring);
>>> -               signaled = fence->signaled;
>>> +               signaled = (fence->seq == RADEON_FENCE_SIGNALED_SEQ);
>>>        }
>>>        write_unlock_irqrestore(&fence->rdev->fence_lock, irq_flags);
>>>        return signaled;
>>> @@ -183,8 +227,8 @@ bool radeon_fence_signaled(struct radeon_fence
>>> *fence)
>>>  int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>>>  {
>>>        struct radeon_device *rdev;
>>> -       unsigned long irq_flags, timeout;
>>> -       u32 seq;
>>> +       unsigned long irq_flags, timeout, last_activity;
>>> +       uint64_t seq;
>>>        int i, r;
>>>        bool signaled;
>>>
>>> @@ -207,7 +251,9 @@ int radeon_fence_wait(struct radeon_fence *fence,
>>> bool intr)
>>>                        timeout = 1;
>>>                }
>>>                /* save current sequence value used to check for GPU
>>> lockups */
>>> -               seq = rdev->fence_drv[fence->ring].last_seq;
>>> +               seq =
>>> atomic64_read(&rdev->fence_drv[fence->ring].last_seq);
>>> +               /* Save current last activity valuee, used to check for
>>> GPU lockups */
>>> +               last_activity =
>>> rdev->fence_drv[fence->ring].last_activity;
>>>                read_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>>
>>>                trace_radeon_fence_wait_begin(rdev->ddev, seq);
>>> @@ -235,24 +281,23 @@ int radeon_fence_wait(struct radeon_fence *fence,
>>> bool intr)
>>>                        }
>>>
>>>                        write_lock_irqsave(&rdev->fence_lock, irq_flags);
>>> -                       /* check if sequence value has changed since
>>> last_activity */
>>> -                       if (seq != rdev->fence_drv[fence->ring].last_seq)
>>> {
>>> +                       /* test if somebody else has already decided that
>>> this is a lockup */
>>> +                       if (last_activity !=
>>> rdev->fence_drv[fence->ring].last_activity) {
>>>                                write_unlock_irqrestore(&rdev->fence_lock,
>>> irq_flags);
>>>                                continue;
>>>                        }
>>>
>>> -                       /* change sequence value on all rings, so nobody
>>> else things there is a lockup */
>>> -                       for (i = 0; i<  RADEON_NUM_RINGS; ++i)
>>> -                               rdev->fence_drv[i].last_seq -= 0x10000;
>>> -
>>> -                       rdev->fence_drv[fence->ring].last_activity =
>>> jiffies;
>>>                        write_unlock_irqrestore(&rdev->fence_lock,
>>> irq_flags);
>>>
>>>                        if (radeon_ring_is_lockup(rdev,
>>> fence->ring,&rdev->ring[fence->ring])) {
>>>
>>> -
>>>                                /* good news we believe it's a lockup */
>>> -                               printk(KERN_WARNING "GPU lockup (waiting
>>> for 0x%08X last fence id 0x%08X)\n",
>>> -                                    fence->seq, seq);
>>> +                               dev_warn(rdev->dev, "GPU lockup (waiting
>>> for 0x%016llx last fence id 0x%016llx)\n",
>>> +                                        fence->seq, seq);
>>> +
>>> +                               /* change last activity so nobody else
>>> think there is a lockup */
>>> +                               for (i = 0; i<  RADEON_NUM_RINGS; ++i) {
>>> +                                       rdev->fence_drv[i].last_activity
>>> = jiffies;
>>> +                               }
>>>
>>>                                /* mark the ring as not ready any more */
>>>                                rdev->ring[fence->ring].ready = false;
>>> @@ -387,9 +432,9 @@ int radeon_fence_driver_start_ring(struct
>>> radeon_device *rdev, int ring)
>>>        }
>>>        rdev->fence_drv[ring].cpu_addr =&rdev->wb.wb[index/4];
>>>
>>>        rdev->fence_drv[ring].gpu_addr = rdev->wb.gpu_addr + index;
>>> -       radeon_fence_write(rdev,
>>> atomic_read(&rdev->fence_drv[ring]..seq), ring);
>>>
>>> +       radeon_fence_write(rdev, rdev->fence_drv[ring].seq, ring);
>>>        rdev->fence_drv[ring].initialized = true;
>>> -       DRM_INFO("fence driver on ring %d use gpu addr 0x%08Lx and cpu
>>> addr 0x%p\n",
>>> +       DRM_INFO("fence driver on ring %d use gpu addr 0x%016llx and cpu
>>> addr 0x%p\n",
>>>                 ring, rdev->fence_drv[ring].gpu_addr,
>>> rdev->fence_drv[ring].cpu_addr);
>>>        write_unlock_irqrestore(&rdev->fence_lock, irq_flags);
>>>        return 0;
>>> @@ -400,7 +445,8 @@ static void radeon_fence_driver_init_ring(struct
>>> radeon_device *rdev, int ring)
>>>        rdev->fence_drv[ring].scratch_reg = -1;
>>>        rdev->fence_drv[ring].cpu_addr = NULL;
>>>        rdev->fence_drv[ring].gpu_addr = 0;
>>> -       atomic_set(&rdev->fence_drv[ring].seq, 0);
>>> +       rdev->fence_drv[ring].seq = 0;
>>> +       atomic64_set(&rdev->fence_drv[ring].last_seq, 0);
>>>        INIT_LIST_HEAD(&rdev->fence_drv[ring].emitted);
>>>        INIT_LIST_HEAD(&rdev->fence_drv[ring].signaled);
>>>        init_waitqueue_head(&rdev->fence_drv[ring].queue);
>>> @@ -458,12 +504,12 @@ static int radeon_debugfs_fence_info(struct
>>> seq_file *m, void *data)
>>>                        continue;
>>>
>>>                seq_printf(m, "--- ring %d ---\n", i);
>>> -               seq_printf(m, "Last signaled fence 0x%08X\n",
>>> -                          radeon_fence_read(rdev, i));
>>> +               seq_printf(m, "Last signaled fence 0x%016lx\n",
>>> +                          atomic64_read(&rdev->fence_drv[i].last_seq));
>>>                if (!list_empty(&rdev->fence_drv[i].emitted)) {
>>>                        fence =
>>> list_entry(rdev->fence_drv[i].emitted.prev,
>>>                                           struct radeon_fence, list);
>>> -                       seq_printf(m, "Last emitted fence %p with
>>> 0x%08X\n",
>>> +                       seq_printf(m, "Last emitted fence %p with
>>> 0x%016llx\n",
>>>                                   fence,  fence->seq);
>>>                }
>>>        }
>>> diff --git a/drivers/gpu/drm/radeon/radeon_ring.c
>>> b/drivers/gpu/drm/radeon/radeon_ring.c
>>> index a4d60ae..4ae222b 100644
>>> --- a/drivers/gpu/drm/radeon/radeon_ring.c
>>> +++ b/drivers/gpu/drm/radeon/radeon_ring.c
>>> @@ -82,7 +82,7 @@ bool radeon_ib_try_free(struct radeon_device *rdev,
>>> struct radeon_ib *ib)
>>>        bool done = false;
>>>
>>>        /* only free ib which have been emited */
>>> -       if (ib->fence&&  ib->fence->emitted) {
>>> +       if (ib->fence&&  ib->fence->seq<  RADEON_FENCE_NOTEMITED_SEQ) {
>>>
>>>                if (radeon_fence_signaled(ib->fence)) {
>>>                        radeon_fence_unref(&ib->fence);
>>>                        radeon_sa_bo_free(rdev,&ib->sa_bo);
>>>
>>> @@ -149,8 +149,9 @@ retry:
>>>        /* this should be rare event, ie all ib scheduled none signaled
>>> yet.
>>>         */
>>>        for (i = 0; i<  RADEON_IB_POOL_SIZE; i++) {
>>> -               if (rdev->ib_pool.ibs[idx].fence&&
>>>  rdev->ib_pool.ibs[idx].fence->emitted) {
>>>
>>> -                       r =
>>> radeon_fence_wait(rdev->ib_pool.ibs[idx].fence, false);
>>> +               struct radeon_fence *fence =
>>> rdev->ib_pool.ibs[idx].fence;
>>> +               if (fence&&  fence->seq<  RADEON_FENCE_NOTEMITED_SEQ) {
>>>
>>> +                       r = radeon_fence_wait(fence, false);
>>>                        if (!r) {
>>>                                goto retry;
>>>                        }
>>> @@ -173,7 +174,7 @@ void radeon_ib_free(struct radeon_device *rdev,
>>> struct radeon_ib **ib)
>>>                return;
>>>        }
>>>        radeon_mutex_lock(&rdev->ib_pool.mutex);
>>> -       if (tmp->fence&&  !tmp->fence->emitted) {
>>> +       if (tmp->fence&&  tmp->fence->seq == RADEON_FENCE_NOTEMITED_SEQ)
>>> {
>>>                radeon_sa_bo_free(rdev,&tmp->sa_bo);
>>>                radeon_fence_unref(&tmp->fence);
>>>        }
>>> --
>>> 1.7.5.4
>>>
>

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: SA and other Patches.
  2012-05-07 14:34 ` SA and other Patches Jerome Glisse
@ 2012-05-07 15:30   ` Jerome Glisse
  0 siblings, 0 replies; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 15:30 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, May 7, 2012 at 10:34 AM, Jerome Glisse <j.glisse@gmail.com> wrote:
> On Mon, May 7, 2012 at 7:42 AM, Christian König <deathsimple@vodafone.de> wrote:
>> Hi Jerome & everybody on the list,
>>
>> this gathers together every patch we developed over the last week or so and
>> which is not already in drm-next.
>>
>> I've run quite some tests with them yesterday and today and as far as I can
>> see hammered out every known bug. For the SA allocator I reverted to tracking
>> the hole pointer instead of just the last allocation, cause otherwise we will
>> never release the first allocation on the list. Glxgears now even keeps happily
>> running if I deadlock on the not GFX rings on purpose.
>
> Now we will release the first entry if we use the last allocate ptr i
> believe it's cleaner to use the last ptr.
>
>> Please take a second look at them and if nobody objects any more we should
>> commit them to drm-next.

Ok took a second look all looks good expect the 14 sa allocator. See
my reply to it.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-07 15:23   ` Jerome Glisse
@ 2012-05-07 16:45     ` Christian König
       [not found]       ` <CAH3drwb=qkpeLrSL9dejg0WEt_bCk98TkR0vtggpn3_maN6gZg@mail.gmail.com>
  0 siblings, 1 reply; 35+ messages in thread
From: Christian König @ 2012-05-07 16:45 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: Jerome Glisse, dri-devel

On 07.05.2012 17:23, Jerome Glisse wrote:
> On Mon, May 7, 2012 at 7:42 AM, Christian König<deathsimple@vodafone.de>  wrote:
>> A startover with a new idea for a multiple ring allocator.
>> Should perform as well as a normal ring allocator as long
>> as only one ring does somthing, but falls back to a more
>> complex algorithm if more complex things start to happen.
>>
>> We store the last allocated bo in last, we always try to allocate
>> after the last allocated bo. Principle is that in a linear GPU ring
>> progression was is after last is the oldest bo we allocated and thus
>> the first one that should no longer be in use by the GPU.
>>
>> If it's not the case we skip over the bo after last to the closest
>> done bo if such one exist. If none exist and we are not asked to
>> block we report failure to allocate.
>>
>> If we are asked to block we wait on all the oldest fence of all
>> rings. We just wait for any of those fence to complete.
>>
>> v2: We need to be able to let hole point to the list_head, otherwise
>>     try free will never free the first allocation of the list. Also
>>     stop calling radeon_fence_signalled more than necessary.
>>
>> Signed-off-by: Christian König<deathsimple@vodafone.de>
>> Signed-off-by: Jerome Glisse<jglisse@redhat.com>
> This one is NAK please use my patch. Yes in my patch we never try to
> free anything if there is only on sa_bo in the list if you really care
> about this it's a one line change:
> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch
Nope that won't work correctly, "last" is pointing to the last 
allocation and that's the most unlikely to be freed at this time. Also 
in this version (like in the one before) radeon_sa_bo_next_hole lets 
hole point to the "prev" of the found sa_bo without checking if this 
isn't the lists head. That might cause a crash if an to be freed 
allocation is the first one in the buffer.

What radeon_sa_bo_try_free would need to do to get your approach working 
is to loop over the end of the buffer and also try to free at the 
beginning, but saying that keeping the last allocation results in a 
whole bunch of extra cases and "if"s, while just keeping a pointer to 
the "hole" (e.g. where the next allocation is most likely to succeed) 
simplifies the code quite a bit (but I agree that on the down side it 
makes it harder to understand).

> Your patch here can enter in infinite loop and never return holding
> the lock. See below.
>
> [SNIP]
>> +               } while (radeon_sa_bo_next_hole(sa_manager, fences));
> Here you can infinite loop, in the case there is a bunch of hole in
> the allocator but none of them allow to full fill the allocation.
> radeon_sa_bo_next_hole will keep returning true looping over and over
> on all the all. That's why i only restrict my patch to 2 hole skeeping
> and then fails the allocation or try to wait. I believe sadly we need
> an heuristic and 2 hole skeeping at most sounded like a good one.
Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in 
conjunction with radeon_sa_bo_try_free are eating up the opportunities 
for holes.

Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, 
with the exception for allocating in a complete scattered buffer, and 
even then it will never loop more often than halve the number of current 
allocations (and that is really really unlikely).

Cheers,
Christian.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Fwd: [PATCH 14/20] drm/radeon: multiple ring allocator v2
       [not found]       ` <CAH3drwb=qkpeLrSL9dejg0WEt_bCk98TkR0vtggpn3_maN6gZg@mail.gmail.com>
@ 2012-05-07 17:59         ` Jerome Glisse
  2012-05-07 18:52           ` Jerome Glisse
  0 siblings, 1 reply; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 17:59 UTC (permalink / raw)
  To: dri-devel

> On 07.05.2012 17:23, Jerome Glisse wrote:
>>
>> On Mon, May 7, 2012 at 7:42 AM, Christian König<deathsimple@vodafone.de>
>>  wrote:
>>>
>>> A startover with a new idea for a multiple ring allocator.
>>> Should perform as well as a normal ring allocator as long
>>> as only one ring does somthing, but falls back to a more
>>> complex algorithm if more complex things start to happen.
>>>
>>> We store the last allocated bo in last, we always try to allocate
>>> after the last allocated bo. Principle is that in a linear GPU ring
>>> progression was is after last is the oldest bo we allocated and thus
>>> the first one that should no longer be in use by the GPU.
>>>
>>> If it's not the case we skip over the bo after last to the closest
>>> done bo if such one exist. If none exist and we are not asked to
>>> block we report failure to allocate.
>>>
>>> If we are asked to block we wait on all the oldest fence of all
>>> rings. We just wait for any of those fence to complete.
>>>
>>> v2: We need to be able to let hole point to the list_head, otherwise
>>>    try free will never free the first allocation of the list. Also
>>>    stop calling radeon_fence_signalled more than necessary.
>>>
>>> Signed-off-by: Christian König<deathsimple@vodafone.de>
>>> Signed-off-by: Jerome Glisse<jglisse@redhat.com>
>>
>> This one is NAK please use my patch. Yes in my patch we never try to
>> free anything if there is only on sa_bo in the list if you really care
>> about this it's a one line change:
>>
>> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch
>
> Nope that won't work correctly, "last" is pointing to the last allocation
> and that's the most unlikely to be freed at this time. Also in this version
> (like in the one before) radeon_sa_bo_next_hole lets hole point to the
> "prev" of the found sa_bo without checking if this isn't the lists head.
> That might cause a crash if an to be freed allocation is the first one in
> the buffer.
>
> What radeon_sa_bo_try_free would need to do to get your approach working is
> to loop over the end of the buffer and also try to free at the beginning,
> but saying that keeping the last allocation results in a whole bunch of
> extra cases and "if"s, while just keeping a pointer to the "hole" (e.g.
> where the next allocation is most likely to succeed) simplifies the code
> quite a bit (but I agree that on the down side it makes it harder to
> understand).
>
>> Your patch here can enter in infinite loop and never return holding
>> the lock. See below.
>>
>> [SNIP]
>>
>>> +               } while (radeon_sa_bo_next_hole(sa_manager, fences));
>>
>> Here you can infinite loop, in the case there is a bunch of hole in
>> the allocator but none of them allow to full fill the allocation.
>> radeon_sa_bo_next_hole will keep returning true looping over and over
>> on all the all. That's why i only restrict my patch to 2 hole skeeping
>> and then fails the allocation or try to wait. I believe sadly we need
>> an heuristic and 2 hole skeeping at most sounded like a good one.
>
> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in
> conjunction with radeon_sa_bo_try_free are eating up the opportunities for
> holes.
>
> Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with
> the exception for allocating in a complete scattered buffer, and even then
> it will never loop more often than halve the number of current allocations
> (and that is really really unlikely).
>
> Cheers,
> Christian.

I looked again and yes it can loop infinitly, think of hole you can
never free ie radeon_sa_bo_try_free can't free anything. This
situation can happen if you have several thread allocating sa bo at
the same time while none of them are yet done with there sa_bo (ie
none have call sa_bo_free yet). I updated a v3 that track oldest and
fix all things you were pointing out above.

http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-07 17:59         ` Fwd: " Jerome Glisse
@ 2012-05-07 18:52           ` Jerome Glisse
  2012-05-07 20:38             ` Christian König
  0 siblings, 1 reply; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 18:52 UTC (permalink / raw)
  To: dri-devel; +Cc: Christian König

On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse <j.glisse@gmail.com> wrote:
>> On 07.05.2012 17:23, Jerome Glisse wrote:
>>>
>>> On Mon, May 7, 2012 at 7:42 AM, Christian König<deathsimple@vodafone.de>
>>>  wrote:
>>>>
>>>> A startover with a new idea for a multiple ring allocator.
>>>> Should perform as well as a normal ring allocator as long
>>>> as only one ring does somthing, but falls back to a more
>>>> complex algorithm if more complex things start to happen.
>>>>
>>>> We store the last allocated bo in last, we always try to allocate
>>>> after the last allocated bo. Principle is that in a linear GPU ring
>>>> progression was is after last is the oldest bo we allocated and thus
>>>> the first one that should no longer be in use by the GPU.
>>>>
>>>> If it's not the case we skip over the bo after last to the closest
>>>> done bo if such one exist. If none exist and we are not asked to
>>>> block we report failure to allocate.
>>>>
>>>> If we are asked to block we wait on all the oldest fence of all
>>>> rings. We just wait for any of those fence to complete.
>>>>
>>>> v2: We need to be able to let hole point to the list_head, otherwise
>>>>    try free will never free the first allocation of the list. Also
>>>>    stop calling radeon_fence_signalled more than necessary.
>>>>
>>>> Signed-off-by: Christian König<deathsimple@vodafone.de>
>>>> Signed-off-by: Jerome Glisse<jglisse@redhat.com>
>>>
>>> This one is NAK please use my patch. Yes in my patch we never try to
>>> free anything if there is only on sa_bo in the list if you really care
>>> about this it's a one line change:
>>>
>>> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch
>>
>> Nope that won't work correctly, "last" is pointing to the last allocation
>> and that's the most unlikely to be freed at this time. Also in this version
>> (like in the one before) radeon_sa_bo_next_hole lets hole point to the
>> "prev" of the found sa_bo without checking if this isn't the lists head.
>> That might cause a crash if an to be freed allocation is the first one in
>> the buffer.
>>
>> What radeon_sa_bo_try_free would need to do to get your approach working is
>> to loop over the end of the buffer and also try to free at the beginning,
>> but saying that keeping the last allocation results in a whole bunch of
>> extra cases and "if"s, while just keeping a pointer to the "hole" (e.g.
>> where the next allocation is most likely to succeed) simplifies the code
>> quite a bit (but I agree that on the down side it makes it harder to
>> understand).
>>
>>> Your patch here can enter in infinite loop and never return holding
>>> the lock. See below.
>>>
>>> [SNIP]
>>>
>>>> +               } while (radeon_sa_bo_next_hole(sa_manager, fences));
>>>
>>> Here you can infinite loop, in the case there is a bunch of hole in
>>> the allocator but none of them allow to full fill the allocation.
>>> radeon_sa_bo_next_hole will keep returning true looping over and over
>>> on all the all. That's why i only restrict my patch to 2 hole skeeping
>>> and then fails the allocation or try to wait. I believe sadly we need
>>> an heuristic and 2 hole skeeping at most sounded like a good one.
>>
>> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in
>> conjunction with radeon_sa_bo_try_free are eating up the opportunities for
>> holes.
>>
>> Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with
>> the exception for allocating in a complete scattered buffer, and even then
>> it will never loop more often than halve the number of current allocations
>> (and that is really really unlikely).
>>
>> Cheers,
>> Christian.
>
> I looked again and yes it can loop infinitly, think of hole you can
> never free ie radeon_sa_bo_try_free can't free anything. This
> situation can happen if you have several thread allocating sa bo at
> the same time while none of them are yet done with there sa_bo (ie
> none have call sa_bo_free yet). I updated a v3 that track oldest and
> fix all things you were pointing out above.
>
> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch
>
> Cheers,
> Jerome

Of course by tracking oldest it defeat the algo so updated patch :
 http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch

Just fix the corner case of list of single entry.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-07 18:52           ` Jerome Glisse
@ 2012-05-07 20:38             ` Christian König
  2012-05-07 21:28               ` Jerome Glisse
  0 siblings, 1 reply; 35+ messages in thread
From: Christian König @ 2012-05-07 20:38 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel

On 07.05.2012 20:52, Jerome Glisse wrote:
> On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse<j.glisse@gmail.com>  wrote:
>>> On 07.05.2012 17:23, Jerome Glisse wrote:
>>>> On Mon, May 7, 2012 at 7:42 AM, Christian König<deathsimple@vodafone.de>
>>>>   wrote:
>>>>> A startover with a new idea for a multiple ring allocator.
>>>>> Should perform as well as a normal ring allocator as long
>>>>> as only one ring does somthing, but falls back to a more
>>>>> complex algorithm if more complex things start to happen.
>>>>>
>>>>> We store the last allocated bo in last, we always try to allocate
>>>>> after the last allocated bo. Principle is that in a linear GPU ring
>>>>> progression was is after last is the oldest bo we allocated and thus
>>>>> the first one that should no longer be in use by the GPU.
>>>>>
>>>>> If it's not the case we skip over the bo after last to the closest
>>>>> done bo if such one exist. If none exist and we are not asked to
>>>>> block we report failure to allocate.
>>>>>
>>>>> If we are asked to block we wait on all the oldest fence of all
>>>>> rings. We just wait for any of those fence to complete.
>>>>>
>>>>> v2: We need to be able to let hole point to the list_head, otherwise
>>>>>     try free will never free the first allocation of the list. Also
>>>>>     stop calling radeon_fence_signalled more than necessary.
>>>>>
>>>>> Signed-off-by: Christian König<deathsimple@vodafone.de>
>>>>> Signed-off-by: Jerome Glisse<jglisse@redhat.com>
>>>> This one is NAK please use my patch. Yes in my patch we never try to
>>>> free anything if there is only on sa_bo in the list if you really care
>>>> about this it's a one line change:
>>>>
>>>> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch
>>> Nope that won't work correctly, "last" is pointing to the last allocation
>>> and that's the most unlikely to be freed at this time. Also in this version
>>> (like in the one before) radeon_sa_bo_next_hole lets hole point to the
>>> "prev" of the found sa_bo without checking if this isn't the lists head.
>>> That might cause a crash if an to be freed allocation is the first one in
>>> the buffer.
>>>
>>> What radeon_sa_bo_try_free would need to do to get your approach working is
>>> to loop over the end of the buffer and also try to free at the beginning,
>>> but saying that keeping the last allocation results in a whole bunch of
>>> extra cases and "if"s, while just keeping a pointer to the "hole" (e.g.
>>> where the next allocation is most likely to succeed) simplifies the code
>>> quite a bit (but I agree that on the down side it makes it harder to
>>> understand).
>>>
>>>> Your patch here can enter in infinite loop and never return holding
>>>> the lock. See below.
>>>>
>>>> [SNIP]
>>>>
>>>>> +               } while (radeon_sa_bo_next_hole(sa_manager, fences));
>>>> Here you can infinite loop, in the case there is a bunch of hole in
>>>> the allocator but none of them allow to full fill the allocation.
>>>> radeon_sa_bo_next_hole will keep returning true looping over and over
>>>> on all the all. That's why i only restrict my patch to 2 hole skeeping
>>>> and then fails the allocation or try to wait. I believe sadly we need
>>>> an heuristic and 2 hole skeeping at most sounded like a good one.
>>> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in
>>> conjunction with radeon_sa_bo_try_free are eating up the opportunities for
>>> holes.
>>>
>>> Look again, it probably will never loop more than RADEON_NUM_RINGS + 1, with
>>> the exception for allocating in a complete scattered buffer, and even then
>>> it will never loop more often than halve the number of current allocations
>>> (and that is really really unlikely).
>>>
>>> Cheers,
>>> Christian.
>> I looked again and yes it can loop infinitly, think of hole you can
>> never free ie radeon_sa_bo_try_free can't free anything. This
>> situation can happen if you have several thread allocating sa bo at
>> the same time while none of them are yet done with there sa_bo (ie
>> none have call sa_bo_free yet). I updated a v3 that track oldest and
>> fix all things you were pointing out above.
No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries 
of the flist, so it only considers holes that have a signaled fence and 
so can be freed.

Having multiple threads allocate objects that can't be freed yet will 
just result in empty flists, and so radeon_sa_bo_next_hole will return 
false, resulting in calling radeon_fence_wait_any with an empty fence 
list, which in turn will result in an ENOENT and abortion of allocation 
(ok maybe we should catch that and return -ENOMEM instead).

So even the corner cases should now be handled fine.

>>
>> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch
>>
>> Cheers,
>> Jerome
> Of course by tracking oldest it defeat the algo so updated patch :
>   http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch
>
> Just fix the corner case of list of single entry.
That still won't work correctly, cause the corner case isn't that there 
is just one allocation left on the list, the corner case is that we need 
to be able to allocate something before the first sa_bo, just consider 
the following with your current implementation:

B F F F F 1 2 3 4 F E

B is the beginning of the buffer.
F is free space.
1,2,3,4 are allocations.
E is the end of the buffer.

So lets say that we have an allocation that won't fit in the free space 
between "4" and "E", now even if if radeon_sa_next_hole sets "last" to 
1, we aren't able to allocate anything at the beginning of the buffer...

Christian.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-07 20:38             ` Christian König
@ 2012-05-07 21:28               ` Jerome Glisse
  2012-05-08 10:23                 ` Christian König
  0 siblings, 1 reply; 35+ messages in thread
From: Jerome Glisse @ 2012-05-07 21:28 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Mon, May 7, 2012 at 4:38 PM, Christian König <deathsimple@vodafone.de> wrote:
> On 07.05.2012 20:52, Jerome Glisse wrote:
>>
>> On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse<j.glisse@gmail.com>  wrote:
>>>>
>>>> On 07.05.2012 17:23, Jerome Glisse wrote:
>>>>>
>>>>> On Mon, May 7, 2012 at 7:42 AM, Christian
>>>>> König<deathsimple@vodafone.de>
>>>>>  wrote:
>>>>>>
>>>>>> A startover with a new idea for a multiple ring allocator.
>>>>>> Should perform as well as a normal ring allocator as long
>>>>>> as only one ring does somthing, but falls back to a more
>>>>>> complex algorithm if more complex things start to happen.
>>>>>>
>>>>>> We store the last allocated bo in last, we always try to allocate
>>>>>> after the last allocated bo. Principle is that in a linear GPU ring
>>>>>> progression was is after last is the oldest bo we allocated and thus
>>>>>> the first one that should no longer be in use by the GPU.
>>>>>>
>>>>>> If it's not the case we skip over the bo after last to the closest
>>>>>> done bo if such one exist. If none exist and we are not asked to
>>>>>> block we report failure to allocate.
>>>>>>
>>>>>> If we are asked to block we wait on all the oldest fence of all
>>>>>> rings. We just wait for any of those fence to complete.
>>>>>>
>>>>>> v2: We need to be able to let hole point to the list_head, otherwise
>>>>>>    try free will never free the first allocation of the list. Also
>>>>>>    stop calling radeon_fence_signalled more than necessary.
>>>>>>
>>>>>> Signed-off-by: Christian König<deathsimple@vodafone.de>
>>>>>> Signed-off-by: Jerome Glisse<jglisse@redhat.com>
>>>>>
>>>>> This one is NAK please use my patch. Yes in my patch we never try to
>>>>> free anything if there is only on sa_bo in the list if you really care
>>>>> about this it's a one line change:
>>>>>
>>>>>
>>>>> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v2.patch
>>>>
>>>> Nope that won't work correctly, "last" is pointing to the last
>>>> allocation
>>>> and that's the most unlikely to be freed at this time. Also in this
>>>> version
>>>> (like in the one before) radeon_sa_bo_next_hole lets hole point to the
>>>> "prev" of the found sa_bo without checking if this isn't the lists head.
>>>> That might cause a crash if an to be freed allocation is the first one
>>>> in
>>>> the buffer.
>>>>
>>>> What radeon_sa_bo_try_free would need to do to get your approach working
>>>> is
>>>> to loop over the end of the buffer and also try to free at the
>>>> beginning,
>>>> but saying that keeping the last allocation results in a whole bunch of
>>>> extra cases and "if"s, while just keeping a pointer to the "hole" (e.g.
>>>> where the next allocation is most likely to succeed) simplifies the code
>>>> quite a bit (but I agree that on the down side it makes it harder to
>>>> understand).
>>>>
>>>>> Your patch here can enter in infinite loop and never return holding
>>>>> the lock. See below.
>>>>>
>>>>> [SNIP]
>>>>>
>>>>>> +               } while (radeon_sa_bo_next_hole(sa_manager, fences));
>>>>>
>>>>> Here you can infinite loop, in the case there is a bunch of hole in
>>>>> the allocator but none of them allow to full fill the allocation.
>>>>> radeon_sa_bo_next_hole will keep returning true looping over and over
>>>>> on all the all. That's why i only restrict my patch to 2 hole skeeping
>>>>> and then fails the allocation or try to wait. I believe sadly we need
>>>>> an heuristic and 2 hole skeeping at most sounded like a good one.
>>>>
>>>> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in
>>>> conjunction with radeon_sa_bo_try_free are eating up the opportunities
>>>> for
>>>> holes.
>>>>
>>>> Look again, it probably will never loop more than RADEON_NUM_RINGS + 1,
>>>> with
>>>> the exception for allocating in a complete scattered buffer, and even
>>>> then
>>>> it will never loop more often than halve the number of current
>>>> allocations
>>>> (and that is really really unlikely).
>>>>
>>>> Cheers,
>>>> Christian.
>>>
>>> I looked again and yes it can loop infinitly, think of hole you can
>>> never free ie radeon_sa_bo_try_free can't free anything. This
>>> situation can happen if you have several thread allocating sa bo at
>>> the same time while none of them are yet done with there sa_bo (ie
>>> none have call sa_bo_free yet). I updated a v3 that track oldest and
>>> fix all things you were pointing out above.
>
> No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of
> the flist, so it only considers holes that have a signaled fence and so can
> be freed.
>
> Having multiple threads allocate objects that can't be freed yet will just
> result in empty flists, and so radeon_sa_bo_next_hole will return false,
> resulting in calling radeon_fence_wait_any with an empty fence list, which
> in turn will result in an ENOENT and abortion of allocation (ok maybe we
> should catch that and return -ENOMEM instead).
>
> So even the corner cases should now be handled fine.

No, there is still infinite loop possible with gpu lockup, i am
against the while (next_hole) using for on 2 iteration looks a lot
better and it avoids sa allocator possibly looping too much (because
it can loop a lot more than RADEON_NUM_RINGS, the maximum number of
loop is sa_manager->size/4).

>
>>>
>>>
>>> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch
>>>
>>> Cheers,
>>> Jerome
>>
>> Of course by tracking oldest it defeat the algo so updated patch :
>>
>>  http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch
>>
>> Just fix the corner case of list of single entry.
>
> That still won't work correctly, cause the corner case isn't that there is
> just one allocation left on the list, the corner case is that we need to be
> able to allocate something before the first sa_bo, just consider the
> following with your current implementation:
>
> B F F F F 1 2 3 4 F E
>
> B is the beginning of the buffer.
> F is free space.
> 1,2,3,4 are allocations.
> E is the end of the buffer.
>
> So lets say that we have an allocation that won't fit in the free space
> between "4" and "E", now even if if radeon_sa_next_hole sets "last" to 1, we
> aren't able to allocate anything at the beginning of the buffer...
>
> Christian.

Yes that isn't handled

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-07 21:28               ` Jerome Glisse
@ 2012-05-08 10:23                 ` Christian König
  2012-05-08 14:55                   ` Jerome Glisse
  0 siblings, 1 reply; 35+ messages in thread
From: Christian König @ 2012-05-08 10:23 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel

On 07.05.2012 23:28, Jerome Glisse wrote:
> On Mon, May 7, 2012 at 4:38 PM, Christian König<deathsimple@vodafone.de>  wrote:
>> On 07.05.2012 20:52, Jerome Glisse wrote:
>>> On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse<j.glisse@gmail.com>    wrote:
>>>>> On 07.05.2012 17:23, Jerome Glisse wrote:
>>>>>> Your patch here can enter in infinite loop and never return holding
>>>>>> the lock. See below.
>>>>>>
>>>>>> [SNIP]
>>>>>>
>>>>>>> +               } while (radeon_sa_bo_next_hole(sa_manager, fences));
>>>>>> Here you can infinite loop, in the case there is a bunch of hole in
>>>>>> the allocator but none of them allow to full fill the allocation.
>>>>>> radeon_sa_bo_next_hole will keep returning true looping over and over
>>>>>> on all the all. That's why i only restrict my patch to 2 hole skeeping
>>>>>> and then fails the allocation or try to wait. I believe sadly we need
>>>>>> an heuristic and 2 hole skeeping at most sounded like a good one.
>>>>> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in
>>>>> conjunction with radeon_sa_bo_try_free are eating up the opportunities
>>>>> for
>>>>> holes.
>>>>>
>>>>> Look again, it probably will never loop more than RADEON_NUM_RINGS + 1,
>>>>> with
>>>>> the exception for allocating in a complete scattered buffer, and even
>>>>> then
>>>>> it will never loop more often than halve the number of current
>>>>> allocations
>>>>> (and that is really really unlikely).
>>>>>
>>>>> Cheers,
>>>>> Christian.
>>>> I looked again and yes it can loop infinitly, think of hole you can
>>>> never free ie radeon_sa_bo_try_free can't free anything. This
>>>> situation can happen if you have several thread allocating sa bo at
>>>> the same time while none of them are yet done with there sa_bo (ie
>>>> none have call sa_bo_free yet). I updated a v3 that track oldest and
>>>> fix all things you were pointing out above.
>> No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries of
>> the flist, so it only considers holes that have a signaled fence and so can
>> be freed.
>>
>> Having multiple threads allocate objects that can't be freed yet will just
>> result in empty flists, and so radeon_sa_bo_next_hole will return false,
>> resulting in calling radeon_fence_wait_any with an empty fence list, which
>> in turn will result in an ENOENT and abortion of allocation (ok maybe we
>> should catch that and return -ENOMEM instead).
>>
>> So even the corner cases should now be handled fine.
> No, there is still infinite loop possible with gpu lockup, i am
> against the while (next_hole) using for on 2 iteration looks a lot
> better and it avoids sa allocator possibly looping too much (because
> it can loop a lot more than RADEON_NUM_RINGS, the maximum number of
> loop is sa_manager->size/4).
I'm still pretty sure that there isn't the possibility for an infinite 
loop, so please explain further where exactly the problem is. 
radeon_sa_bo_next_hole will return true as long as it can find AND 
remove an allocation with an already signaled fence, and since nobody 
else can add allocations while we are in the loop we sooner or later run 
out of allocations and so the loop ends.

Also what the loop does is just cleaning up all the already signaled 
allocations, and it doesn't matter if there is one allocation or a 
million, we need to clean them up anyway. So aborting the loop and 
trying to wait for anything to be signaled makes no sense at all, and 
aborting the whole allocation at this point makes even less sense, cause 
that only delays the work that needs to be done anyway (freeing the 
allocations) to a later call to radeon_sa_bo_new.

What could make sense is limiting how often we are waiting for some 
fences, since while waiting we release the lock and then other processes 
can jump in and grab what we wanted to have by waiting for something to 
happen.

Cheers,
Christian.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-08 10:23                 ` Christian König
@ 2012-05-08 14:55                   ` Jerome Glisse
  2012-05-09  9:53                     ` Christian König
  0 siblings, 1 reply; 35+ messages in thread
From: Jerome Glisse @ 2012-05-08 14:55 UTC (permalink / raw)
  To: Christian König; +Cc: dri-devel

On Tue, May 8, 2012 at 6:23 AM, Christian König <deathsimple@vodafone.de> wrote:
> On 07.05.2012 23:28, Jerome Glisse wrote:
>>
>> On Mon, May 7, 2012 at 4:38 PM, Christian König<deathsimple@vodafone.de>
>>  wrote:
>>>
>>> On 07.05.2012 20:52, Jerome Glisse wrote:
>>>>
>>>> On Mon, May 7, 2012 at 1:59 PM, Jerome Glisse<j.glisse@gmail.com>
>>>>  wrote:
>>>>>>
>>>>>> On 07.05.2012 17:23, Jerome Glisse wrote:
>>>>>>>
>>>>>>> Your patch here can enter in infinite loop and never return holding
>>>>>>> the lock. See below.
>>>>>>>
>>>>>>> [SNIP]
>>>>>>>
>>>>>>>> +               } while (radeon_sa_bo_next_hole(sa_manager,
>>>>>>>> fences));
>>>>>>>
>>>>>>> Here you can infinite loop, in the case there is a bunch of hole in
>>>>>>> the allocator but none of them allow to full fill the allocation.
>>>>>>> radeon_sa_bo_next_hole will keep returning true looping over and over
>>>>>>> on all the all. That's why i only restrict my patch to 2 hole
>>>>>>> skeeping
>>>>>>> and then fails the allocation or try to wait. I believe sadly we need
>>>>>>> an heuristic and 2 hole skeeping at most sounded like a good one.
>>>>>>
>>>>>> Nope, that can't be an infinite loop, cause radeon_sa_bo_next_hole in
>>>>>> conjunction with radeon_sa_bo_try_free are eating up the opportunities
>>>>>> for
>>>>>> holes.
>>>>>>
>>>>>> Look again, it probably will never loop more than RADEON_NUM_RINGS +
>>>>>> 1,
>>>>>> with
>>>>>> the exception for allocating in a complete scattered buffer, and even
>>>>>> then
>>>>>> it will never loop more often than halve the number of current
>>>>>> allocations
>>>>>> (and that is really really unlikely).
>>>>>>
>>>>>> Cheers,
>>>>>> Christian.
>>>>>
>>>>> I looked again and yes it can loop infinitly, think of hole you can
>>>>> never free ie radeon_sa_bo_try_free can't free anything. This
>>>>> situation can happen if you have several thread allocating sa bo at
>>>>> the same time while none of them are yet done with there sa_bo (ie
>>>>> none have call sa_bo_free yet). I updated a v3 that track oldest and
>>>>> fix all things you were pointing out above.
>>>
>>> No that isn't a problem, radeon_sa_bo_next_hole takes the firsts entries
>>> of
>>> the flist, so it only considers holes that have a signaled fence and so
>>> can
>>> be freed.
>>>
>>> Having multiple threads allocate objects that can't be freed yet will
>>> just
>>> result in empty flists, and so radeon_sa_bo_next_hole will return false,
>>> resulting in calling radeon_fence_wait_any with an empty fence list,
>>> which
>>> in turn will result in an ENOENT and abortion of allocation (ok maybe we
>>> should catch that and return -ENOMEM instead).
>>>
>>> So even the corner cases should now be handled fine.
>>
>> No, there is still infinite loop possible with gpu lockup, i am
>> against the while (next_hole) using for on 2 iteration looks a lot
>> better and it avoids sa allocator possibly looping too much (because
>> it can loop a lot more than RADEON_NUM_RINGS, the maximum number of
>> loop is sa_manager->size/4).
>
> I'm still pretty sure that there isn't the possibility for an infinite loop,
> so please explain further where exactly the problem is.
> radeon_sa_bo_next_hole will return true as long as it can find AND remove an
> allocation with an already signaled fence, and since nobody else can add
> allocations while we are in the loop we sooner or later run out of
> allocations and so the loop ends.

Yeah you right

> Also what the loop does is just cleaning up all the already signaled
> allocations, and it doesn't matter if there is one allocation or a million,
> we need to clean them up anyway. So aborting the loop and trying to wait for
> anything to be signaled makes no sense at all, and aborting the whole
> allocation at this point makes even less sense, cause that only delays the
> work that needs to be done anyway (freeing the allocations) to a later call
> to radeon_sa_bo_new.
>
> What could make sense is limiting how often we are waiting for some fences,
> since while waiting we release the lock and then other processes can jump in
> and grab what we wanted to have by waiting for something to happen.
>
> Cheers,
> Christian.

Still i don't want to loop more than necessary, it's bad, i am ok with :
http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch

If there is fence signaled it will retry 2 times at most, otherwise it
will go to wait and
that way better. Because with your while loop the worst case is
something proportional to
the manager size given it's 1Mo it can loop for a long long time.

Cheers,
Jerome

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 14/20] drm/radeon: multiple ring allocator v2
  2012-05-08 14:55                   ` Jerome Glisse
@ 2012-05-09  9:53                     ` Christian König
  0 siblings, 0 replies; 35+ messages in thread
From: Christian König @ 2012-05-09  9:53 UTC (permalink / raw)
  To: Jerome Glisse; +Cc: dri-devel

On 08.05.2012 16:55, Jerome Glisse wrote:
> Still i don't want to loop more than necessary, it's bad, i am ok with :
> http://people.freedesktop.org/~glisse/reset5/0001-drm-radeon-multiple-ring-allocator-v3.patch
>
> If there is fence signaled it will retry 2 times at most, otherwise it
> will go to wait and
> that way better. Because with your while loop the worst case is
> something proportional to
> the manager size given it's 1Mo it can loop for a long long time.
Yeah, this loop can indeed consume quite some time. Ok then let's at 
least give every ring a chance to supply some holes, otherwise I fear 
that we might not even found something worth to wait for after only 2 tries.

Going to send that out after figuring out why the patchset still causes 
texture corruptions on my system.

Christian.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-05-09  9:53 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-05-07 11:42 SA and other Patches Christian König
2012-05-07 11:42 ` [PATCH 01/20] drm/radeon: fix possible lack of synchronization btw ttm and other ring Christian König
2012-05-07 11:42 ` [PATCH 02/20] drm/radeon: clarify and extend wb setup on APUs and NI+ asics Christian König
2012-05-07 11:42 ` [PATCH 03/20] drm/radeon: replace the per ring mutex with a global one Christian König
2012-05-07 11:42 ` [PATCH 04/20] drm/radeon: convert fence to uint64_t v4 Christian König
2012-05-07 14:39   ` Jerome Glisse
2012-05-07 15:04     ` Christian König
2012-05-07 15:27       ` Jerome Glisse
2012-05-07 11:42 ` [PATCH 05/20] drm/radeon: rework fence handling, drop fence list v5 Christian König
2012-05-07 11:42 ` [PATCH 06/20] drm/radeon: rework locking ring emission mutex in fence deadlock detection Christian König
2012-05-07 11:42 ` [PATCH 07/20] drm/radeon: use inline functions to calc sa_bo addr Christian König
2012-05-07 11:42 ` [PATCH 08/20] drm/radeon: add proper locking to the SA v3 Christian König
2012-05-07 11:42 ` [PATCH 09/20] drm/radeon: add sub allocator debugfs file Christian König
2012-05-07 11:42 ` [PATCH 10/20] drm/radeon: keep start and end offset in the SA Christian König
2012-05-07 11:42 ` [PATCH 11/20] drm/radeon: make sa bo a stand alone object Christian König
2012-05-07 11:42 ` [PATCH 12/20] drm/radeon: define new SA interface v3 Christian König
2012-05-07 11:42 ` [PATCH 13/20] drm/radeon: use one wait queue for all rings add fence_wait_any v2 Christian König
2012-05-07 11:42 ` [PATCH 14/20] drm/radeon: multiple ring allocator v2 Christian König
2012-05-07 15:23   ` Jerome Glisse
2012-05-07 16:45     ` Christian König
     [not found]       ` <CAH3drwb=qkpeLrSL9dejg0WEt_bCk98TkR0vtggpn3_maN6gZg@mail.gmail.com>
2012-05-07 17:59         ` Fwd: " Jerome Glisse
2012-05-07 18:52           ` Jerome Glisse
2012-05-07 20:38             ` Christian König
2012-05-07 21:28               ` Jerome Glisse
2012-05-08 10:23                 ` Christian König
2012-05-08 14:55                   ` Jerome Glisse
2012-05-09  9:53                     ` Christian König
2012-05-07 11:42 ` [PATCH 15/20] drm/radeon: simplify semaphore handling v2 Christian König
2012-05-07 11:42 ` [PATCH 16/20] drm/radeon: rip out the ib pool Christian König
2012-05-07 11:42 ` [PATCH 17/20] drm/radeon: immediately free ttm-move semaphore Christian König
2012-05-07 11:42 ` [PATCH 18/20] drm/radeon: move the semaphore from the fence into the ib Christian König
2012-05-07 11:42 ` [PATCH 19/20] drm/radeon: remove r600 blit mutex v2 Christian König
2012-05-07 11:42 ` [PATCH 20/20] drm/radeon: make the ib an inline object Christian König
2012-05-07 14:34 ` SA and other Patches Jerome Glisse
2012-05-07 15:30   ` Jerome Glisse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.