All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2
@ 2012-11-05 13:55 Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:55 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel

A patch series for next that removes a substantial number of read-modify-write
operations from TTM command submission, in particular if TTM objects are used
to export objects to user-space. The only per-object atomic r-m-w operations
left during a typical execbuf call should be refcount up and down.

v2: Formatting fixes.

In-Reply-To: 

>From Thomas Hellstrom <thellstrom@vmware.com> # This line is ignored.
From: Thomas Hellstrom <thellstrom@vmware.com>
Subject: 
In-Reply-To: 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/4] drm: Make hashtab rcu-safe
  2012-11-05 13:55 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2 Thomas Hellstrom
@ 2012-11-05 13:55 ` Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 2/4] kref: Implement kref_get_unless_zero v2 Thomas Hellstrom
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:55 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

TTM base objects will be the first consumer.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/drm_hashtab.c |   18 +++++++-----------
 1 files changed, 7 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/drm_hashtab.c b/drivers/gpu/drm/drm_hashtab.c
index c3745c4..5729e39 100644
--- a/drivers/gpu/drm/drm_hashtab.c
+++ b/drivers/gpu/drm/drm_hashtab.c
@@ -67,10 +67,8 @@ void drm_ht_verbose_list(struct drm_open_hash *ht, unsigned long key)
 	hashed_key = hash_long(key, ht->order);
 	DRM_DEBUG("Key is 0x%08lx, Hashed key is 0x%08x\n", key, hashed_key);
 	h_list = &ht->table[hashed_key];
-	hlist_for_each(list, h_list) {
-		entry = hlist_entry(list, struct drm_hash_item, head);
+	hlist_for_each_entry_rcu(entry, list, h_list, head)
 		DRM_DEBUG("count %d, key: 0x%08lx\n", count++, entry->key);
-	}
 }
 
 static struct hlist_node *drm_ht_find_key(struct drm_open_hash *ht,
@@ -83,8 +81,7 @@ static struct hlist_node *drm_ht_find_key(struct drm_open_hash *ht,
 
 	hashed_key = hash_long(key, ht->order);
 	h_list = &ht->table[hashed_key];
-	hlist_for_each(list, h_list) {
-		entry = hlist_entry(list, struct drm_hash_item, head);
+	hlist_for_each_entry_rcu(entry, list, h_list, head) {
 		if (entry->key == key)
 			return list;
 		if (entry->key > key)
@@ -105,8 +102,7 @@ int drm_ht_insert_item(struct drm_open_hash *ht, struct drm_hash_item *item)
 	hashed_key = hash_long(key, ht->order);
 	h_list = &ht->table[hashed_key];
 	parent = NULL;
-	hlist_for_each(list, h_list) {
-		entry = hlist_entry(list, struct drm_hash_item, head);
+	hlist_for_each_entry_rcu(entry, list, h_list, head) {
 		if (entry->key == key)
 			return -EINVAL;
 		if (entry->key > key)
@@ -114,9 +110,9 @@ int drm_ht_insert_item(struct drm_open_hash *ht, struct drm_hash_item *item)
 		parent = list;
 	}
 	if (parent) {
-		hlist_add_after(parent, &item->head);
+		hlist_add_after_rcu(parent, &item->head);
 	} else {
-		hlist_add_head(&item->head, h_list);
+		hlist_add_head_rcu(&item->head, h_list);
 	}
 	return 0;
 }
@@ -171,7 +167,7 @@ int drm_ht_remove_key(struct drm_open_hash *ht, unsigned long key)
 
 	list = drm_ht_find_key(ht, key);
 	if (list) {
-		hlist_del_init(list);
+		hlist_del_init_rcu(list);
 		return 0;
 	}
 	return -EINVAL;
@@ -179,7 +175,7 @@ int drm_ht_remove_key(struct drm_open_hash *ht, unsigned long key)
 
 int drm_ht_remove_item(struct drm_open_hash *ht, struct drm_hash_item *item)
 {
-	hlist_del_init(&item->head);
+	hlist_del_init_rcu(&item->head);
 	return 0;
 }
 EXPORT_SYMBOL(drm_ht_remove_item);
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/4] kref: Implement kref_get_unless_zero v2
  2012-11-05 13:55 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2 Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
@ 2012-11-05 13:55 ` Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups v2 Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom
  3 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:55 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

This function is intended to simplify locking around refcounting for
objects that can be looked up from a lookup structure, and which are
removed from that lookup structure in the object destructor.
Operations on such objects require at least a read lock around
lookup + kref_get, and a write lock around kref_put + remove from lookup
structure. Furthermore, RCU implementations become extremely tricky.
With a lookup followed by a kref_get_unless_zero *with return value check*
locking in the kref_put path can be deferred to the actual removal from
the lookup structure and RCU lookups become trivial.

v2: Formatting fixes.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 include/linux/kref.h |   21 +++++++++++++++++++++
 1 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/include/linux/kref.h b/include/linux/kref.h
index 65af688..bae91d0 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -111,4 +111,25 @@ static inline int kref_put_mutex(struct kref *kref,
 	}
 	return 0;
 }
+
+/**
+ * kref_get_unless_zero - Increment refcount for object unless it is zero.
+ * @kref: object.
+ *
+ * Return 0 if the increment succeeded. Otherwise return non-zero.
+ *
+ * This function is intended to simplify locking around refcounting for
+ * objects that can be looked up from a lookup structure, and which are
+ * removed from that lookup structure in the object destructor.
+ * Operations on such objects require at least a read lock around
+ * lookup + kref_get, and a write lock around kref_put + remove from lookup
+ * structure. Furthermore, RCU implementations become extremely tricky.
+ * With a lookup followed by a kref_get_unless_zero *with return value check*
+ * locking in the kref_put path can be deferred to the actual removal from
+ * the lookup structure and RCU lookups become trivial.
+ */
+static inline int __must_check kref_get_unless_zero(struct kref *kref)
+{
+	return !atomic_add_unless(&kref->refcount, 1, 0);
+}
 #endif /* _KREF_H_ */
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups v2
  2012-11-05 13:55 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2 Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 2/4] kref: Implement kref_get_unless_zero v2 Thomas Hellstrom
@ 2012-11-05 13:55 ` Thomas Hellstrom
  2012-11-05 13:55 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom
  3 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:55 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

The mostly used lookup+get put+potential_destroy path of TTM objects
is converted to use RCU locks. This will substantially decrease the amount
of locked bus cycles during normal operation.
Since we use kfree_rcu to free the objects, no rcu synchronization is needed
at module unload time.

v2: Don't touch include/linux/kref.h

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_object.c         |   30 +++++++++++-------------------
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c |    8 ++++----
 include/drm/ttm/ttm_object.h             |    4 ++++
 3 files changed, 19 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_object.c b/drivers/gpu/drm/ttm/ttm_object.c
index c785787..9d7f674 100644
--- a/drivers/gpu/drm/ttm/ttm_object.c
+++ b/drivers/gpu/drm/ttm/ttm_object.c
@@ -80,7 +80,7 @@ struct ttm_object_file {
  */
 
 struct ttm_object_device {
-	rwlock_t object_lock;
+	spinlock_t object_lock;
 	struct drm_open_hash object_hash;
 	atomic_t object_count;
 	struct ttm_mem_global *mem_glob;
@@ -157,12 +157,12 @@ int ttm_base_object_init(struct ttm_object_file *tfile,
 	base->refcount_release = refcount_release;
 	base->ref_obj_release = ref_obj_release;
 	base->object_type = object_type;
-	write_lock(&tdev->object_lock);
+	spin_lock(&tdev->object_lock);
 	kref_init(&base->refcount);
 	ret = drm_ht_just_insert_please(&tdev->object_hash,
 					&base->hash,
 					(unsigned long)base, 31, 0, 0);
-	write_unlock(&tdev->object_lock);
+	spin_unlock(&tdev->object_lock);
 	if (unlikely(ret != 0))
 		goto out_err0;
 
@@ -186,30 +186,22 @@ static void ttm_release_base(struct kref *kref)
 	    container_of(kref, struct ttm_base_object, refcount);
 	struct ttm_object_device *tdev = base->tfile->tdev;
 
+	spin_lock(&tdev->object_lock);
 	(void)drm_ht_remove_item(&tdev->object_hash, &base->hash);
-	write_unlock(&tdev->object_lock);
+	spin_unlock(&tdev->object_lock);
 	if (base->refcount_release) {
 		ttm_object_file_unref(&base->tfile);
 		base->refcount_release(&base);
 	}
-	write_lock(&tdev->object_lock);
 }
 
 void ttm_base_object_unref(struct ttm_base_object **p_base)
 {
 	struct ttm_base_object *base = *p_base;
-	struct ttm_object_device *tdev = base->tfile->tdev;
 
 	*p_base = NULL;
 
-	/*
-	 * Need to take the lock here to avoid racing with
-	 * users trying to look up the object.
-	 */
-
-	write_lock(&tdev->object_lock);
 	kref_put(&base->refcount, ttm_release_base);
-	write_unlock(&tdev->object_lock);
 }
 EXPORT_SYMBOL(ttm_base_object_unref);
 
@@ -221,14 +213,14 @@ struct ttm_base_object *ttm_base_object_lookup(struct ttm_object_file *tfile,
 	struct drm_hash_item *hash;
 	int ret;
 
-	read_lock(&tdev->object_lock);
+	rcu_read_lock();
 	ret = drm_ht_find_item(&tdev->object_hash, key, &hash);
 
 	if (likely(ret == 0)) {
 		base = drm_hash_entry(hash, struct ttm_base_object, hash);
-		kref_get(&base->refcount);
+		ret = kref_get_unless_zero(&base->refcount);
 	}
-	read_unlock(&tdev->object_lock);
+	rcu_read_unlock();
 
 	if (unlikely(ret != 0))
 		return NULL;
@@ -426,7 +418,7 @@ struct ttm_object_device *ttm_object_device_init(struct ttm_mem_global
 		return NULL;
 
 	tdev->mem_glob = mem_glob;
-	rwlock_init(&tdev->object_lock);
+	spin_lock_init(&tdev->object_lock);
 	atomic_set(&tdev->object_count, 0);
 	ret = drm_ht_create(&tdev->object_hash, hash_order);
 
@@ -444,9 +436,9 @@ void ttm_object_device_release(struct ttm_object_device **p_tdev)
 
 	*p_tdev = NULL;
 
-	write_lock(&tdev->object_lock);
+	spin_lock(&tdev->object_lock);
 	drm_ht_remove(&tdev->object_hash);
-	write_unlock(&tdev->object_lock);
+	spin_unlock(&tdev->object_lock);
 
 	kfree(tdev);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
index da3c6b5..ae675c6 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_resource.c
@@ -351,7 +351,7 @@ static void vmw_user_context_free(struct vmw_resource *res)
 	    container_of(res, struct vmw_user_context, res);
 	struct vmw_private *dev_priv = res->dev_priv;
 
-	kfree(ctx);
+	ttm_base_object_kfree(ctx, base);
 	ttm_mem_global_free(vmw_mem_glob(dev_priv),
 			    vmw_user_context_size);
 }
@@ -1147,7 +1147,7 @@ static void vmw_user_surface_free(struct vmw_resource *res)
 	kfree(srf->offsets);
 	kfree(srf->sizes);
 	kfree(srf->snooper.image);
-	kfree(user_srf);
+	ttm_base_object_kfree(user_srf, base);
 	ttm_mem_global_free(vmw_mem_glob(dev_priv), size);
 }
 
@@ -1575,7 +1575,7 @@ static void vmw_user_dmabuf_destroy(struct ttm_buffer_object *bo)
 {
 	struct vmw_user_dma_buffer *vmw_user_bo = vmw_user_dma_buffer(bo);
 
-	kfree(vmw_user_bo);
+	ttm_base_object_kfree(vmw_user_bo, base);
 }
 
 static void vmw_user_dmabuf_release(struct ttm_base_object **p_base)
@@ -1763,7 +1763,7 @@ static void vmw_user_stream_free(struct vmw_resource *res)
 	    container_of(res, struct vmw_user_stream, stream.res);
 	struct vmw_private *dev_priv = res->dev_priv;
 
-	kfree(stream);
+	ttm_base_object_kfree(stream, base);
 	ttm_mem_global_free(vmw_mem_glob(dev_priv),
 			    vmw_user_stream_size);
 }
diff --git a/include/drm/ttm/ttm_object.h b/include/drm/ttm/ttm_object.h
index b01c563..fc0cf06 100644
--- a/include/drm/ttm/ttm_object.h
+++ b/include/drm/ttm/ttm_object.h
@@ -40,6 +40,7 @@
 #include <linux/list.h>
 #include <drm/drm_hashtab.h>
 #include <linux/kref.h>
+#include <linux/rcupdate.h>
 #include <ttm/ttm_memory.h>
 
 /**
@@ -120,6 +121,7 @@ struct ttm_object_device;
  */
 
 struct ttm_base_object {
+	struct rcu_head rhead;
 	struct drm_hash_item hash;
 	enum ttm_object_type object_type;
 	bool shareable;
@@ -268,4 +270,6 @@ extern struct ttm_object_device *ttm_object_device_init
 
 extern void ttm_object_device_release(struct ttm_object_device **p_tdev);
 
+#define ttm_base_object_kfree(__object, __base)\
+	kfree_rcu(__object, __base.rhead)
 #endif
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 4/4] drm/ttm: Optimize reservation slightly
  2012-11-05 13:55 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2 Thomas Hellstrom
                   ` (2 preceding siblings ...)
  2012-11-05 13:55 ` [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups v2 Thomas Hellstrom
@ 2012-11-05 13:55 ` Thomas Hellstrom
  3 siblings, 0 replies; 5+ messages in thread
From: Thomas Hellstrom @ 2012-11-05 13:55 UTC (permalink / raw)
  To: airlied, airlied; +Cc: dri-devel, linux-kernel, Thomas Hellstrom

Reservation locking currently always takes place under the LRU spinlock.
Hence, strictly there is no need for an atomic_cmpxchg call; we can use
atomic_read followed by atomic_write since nobody else will ever reserve
without the lru spinlock held.
At least on Intel this should remove a locked bus cycle on successful
reserve.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 drivers/gpu/drm/ttm/ttm_bo.c |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c
index bf6e4b5..46008ea 100644
--- a/drivers/gpu/drm/ttm/ttm_bo.c
+++ b/drivers/gpu/drm/ttm/ttm_bo.c
@@ -220,7 +220,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 	struct ttm_bo_global *glob = bo->glob;
 	int ret;
 
-	while (unlikely(atomic_cmpxchg(&bo->reserved, 0, 1) != 0)) {
+	while (unlikely(atomic_read(&bo->reserved) != 0)) {
 		/**
 		 * Deadlock avoidance for multi-bo reserving.
 		 */
@@ -249,6 +249,7 @@ int ttm_bo_reserve_locked(struct ttm_buffer_object *bo,
 			return ret;
 	}
 
+	atomic_set(&bo->reserved, 1);
 	if (use_sequence) {
 		/**
 		 * Wake up waiters that may need to recheck for deadlock,
-- 
1.7.4.4


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-11-05 13:56 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-05 13:55 [PATCH 0/4] drm/ttm: Get rid of a number of atomic read-modify-write ops v2 Thomas Hellstrom
2012-11-05 13:55 ` [PATCH 1/4] drm: Make hashtab rcu-safe Thomas Hellstrom
2012-11-05 13:55 ` [PATCH 2/4] kref: Implement kref_get_unless_zero v2 Thomas Hellstrom
2012-11-05 13:55 ` [PATCH 3/4] drm/ttm, drm/vmwgfx: Use RCU locking for object lookups v2 Thomas Hellstrom
2012-11-05 13:55 ` [PATCH 4/4] drm/ttm: Optimize reservation slightly Thomas Hellstrom

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.