All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Cross-driver ww transaction lock lists
@ 2021-04-13  7:50 ` Thomas Hellström
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström @ 2021-04-13  7:50 UTC (permalink / raw)
  To: dri-devel; +Cc: intel-gfx, linaro-mm-sig, Matthew Auld, Christian König

Hi!

During the dma_resv conversion of the i915 driver, we've been using ww
transaction lock lists to keep track of ww_mutexes that are locked
during the transaction so that they can be batch unlocked at suitable
locations. Including also the LMEM/VRAM eviction we've ended up with
two static lists per transaction context; one typically unlocked at the
end of transaction and one initialized before and unlocked after each
buffer object validate. This enables us to do sleeping locking at
eviction and keep objects locked on the eviction list until we
eventually succeed allocating memory (modulo minor flaws of course).

It would be beneficial with the i915 TTM conversion to be able to
introduce a similar functionality that would work in ttm but also
cross-driver in, for example move_notify. It would also be beneficial
to be able to put any dma_resv ww mutex on the lists, and not require
it to be embedded in a particular object type.

I started scetching on some utilities for this. For TTM, for example,
the idea would be to pass a list head for the ww transaction lock list
in the ttm_operation_ctx. A function taking a ww_mutex could then
either attach a grabbed lock to the list for batch unlocking, or be
responsible for unlocking when the lock's scope is exited. It's also
possible to create sublists if so desired. I believe the below would be
sufficient to cover the i915 functionality.

Any comments and suggestions appreciated!

8<------------------------------------------------------

#ifndef _TRANSACTION_LOCKLIST_H_
#define _TRANSACTION_LOCKLIST_H_

struct trans_lockitem;

/**
 * struct trans_locklist_ops - Ops structure for the ww locklist
functionality.
 *
 * Typically a const struct trans_locklist_ops is defined for each type
that
 * embeds a struct trans_lockitem, or hav a struct trans_lockitem
pointing
 * at it using the private pointer. It can be a buffer object,
reservation
 * object, a single ww_mutex or even a sublist of trans_lockitems. 
 */
struct trans_locklist_ops {
	/**
	 * slow_lock: Slow lock to relax the transaction. Only used by
	 * a contending lock item.
	 * @item: The struct trans_lockitem to lock
	 * @intr: Whether to lock interruptible
	 *
	 * Return: -ERESTARTSYS: Hit a signal when relaxing,
	 * -EAGAIN, relaxing succesful, but the contending lock
	 * remains unlocked.
	 */
	int (*slow_lock) (struct trans_lockitem *item, bool intr);

	/**
	 * unlock: Unlock.
	 * @item: The struct trans_lockitem to unlock.
	 */
	void (*unlock) (struct trans_lockitem *item);

	/**
	 * unlock: Unlock.
	 * @item: The struct trans_lockitem to put. This function may
be NULL.
	 */
	void (*put) (struct trans_lockitem *item);
};

/**
 * struct trans_lockitem
 * @ops: Pointer to an Ops structure for this lockitem.
 * @link: List link for the transaction locklist.
 * @private: Private info.
 * @relax_unlock: Unlock contending lock after relaxation since it is
 * likely not needed after a transaction restart.
 *
 * A struct trans_lockitem typically represents a single lock, but is
 * generic enough to represent a sublist of locks. It can either be
 * embedded, or allocated on demand. A kmem_cache might be beneficial.
 */
struct trans_lockitem {
	const struct trans_locklist_ops *ops;
	struct list_head link;
	u32 relax_unlock:1;
	void *private;
};

/* unlock example */
static inline void trans_unlock_put_all(struct list_head *list)
{
	struct trans_lockitem *lock, *next;

	list_for_each_entry_safe(lock, next, typeof(*lock), link) {
		lock->ops->unlock(lock);
		list_del_init(&lock->link);
		if (lock->ops_put)
			lock->ops->put(lock);
	}
}

/* Backoff example */
static inline int __must_check trans_backoff(struct list_head *list,
bool intr,
					     struct trans_lockitem
*contending)
{
	int ret = 0;

	trans_unlock_put_all(list);
	if (contending) {
		ret = contending->ops->slow_lock(contending, intr);
		if (!ret && contending->relax_unlock)
			contending->unlock(contending);
		if (ret == -EAGAIN)
			ret = 0;
		contending->ops->put(contending);
	}

	return ret;
}
		
		
#endif _TRANSACTION_LOCKLIST_


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Intel-gfx] [RFC] Cross-driver ww transaction lock lists
@ 2021-04-13  7:50 ` Thomas Hellström
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström @ 2021-04-13  7:50 UTC (permalink / raw)
  To: dri-devel; +Cc: intel-gfx, linaro-mm-sig, Matthew Auld, Christian König

Hi!

During the dma_resv conversion of the i915 driver, we've been using ww
transaction lock lists to keep track of ww_mutexes that are locked
during the transaction so that they can be batch unlocked at suitable
locations. Including also the LMEM/VRAM eviction we've ended up with
two static lists per transaction context; one typically unlocked at the
end of transaction and one initialized before and unlocked after each
buffer object validate. This enables us to do sleeping locking at
eviction and keep objects locked on the eviction list until we
eventually succeed allocating memory (modulo minor flaws of course).

It would be beneficial with the i915 TTM conversion to be able to
introduce a similar functionality that would work in ttm but also
cross-driver in, for example move_notify. It would also be beneficial
to be able to put any dma_resv ww mutex on the lists, and not require
it to be embedded in a particular object type.

I started scetching on some utilities for this. For TTM, for example,
the idea would be to pass a list head for the ww transaction lock list
in the ttm_operation_ctx. A function taking a ww_mutex could then
either attach a grabbed lock to the list for batch unlocking, or be
responsible for unlocking when the lock's scope is exited. It's also
possible to create sublists if so desired. I believe the below would be
sufficient to cover the i915 functionality.

Any comments and suggestions appreciated!

8<------------------------------------------------------

#ifndef _TRANSACTION_LOCKLIST_H_
#define _TRANSACTION_LOCKLIST_H_

struct trans_lockitem;

/**
 * struct trans_locklist_ops - Ops structure for the ww locklist
functionality.
 *
 * Typically a const struct trans_locklist_ops is defined for each type
that
 * embeds a struct trans_lockitem, or hav a struct trans_lockitem
pointing
 * at it using the private pointer. It can be a buffer object,
reservation
 * object, a single ww_mutex or even a sublist of trans_lockitems. 
 */
struct trans_locklist_ops {
	/**
	 * slow_lock: Slow lock to relax the transaction. Only used by
	 * a contending lock item.
	 * @item: The struct trans_lockitem to lock
	 * @intr: Whether to lock interruptible
	 *
	 * Return: -ERESTARTSYS: Hit a signal when relaxing,
	 * -EAGAIN, relaxing succesful, but the contending lock
	 * remains unlocked.
	 */
	int (*slow_lock) (struct trans_lockitem *item, bool intr);

	/**
	 * unlock: Unlock.
	 * @item: The struct trans_lockitem to unlock.
	 */
	void (*unlock) (struct trans_lockitem *item);

	/**
	 * unlock: Unlock.
	 * @item: The struct trans_lockitem to put. This function may
be NULL.
	 */
	void (*put) (struct trans_lockitem *item);
};

/**
 * struct trans_lockitem
 * @ops: Pointer to an Ops structure for this lockitem.
 * @link: List link for the transaction locklist.
 * @private: Private info.
 * @relax_unlock: Unlock contending lock after relaxation since it is
 * likely not needed after a transaction restart.
 *
 * A struct trans_lockitem typically represents a single lock, but is
 * generic enough to represent a sublist of locks. It can either be
 * embedded, or allocated on demand. A kmem_cache might be beneficial.
 */
struct trans_lockitem {
	const struct trans_locklist_ops *ops;
	struct list_head link;
	u32 relax_unlock:1;
	void *private;
};

/* unlock example */
static inline void trans_unlock_put_all(struct list_head *list)
{
	struct trans_lockitem *lock, *next;

	list_for_each_entry_safe(lock, next, typeof(*lock), link) {
		lock->ops->unlock(lock);
		list_del_init(&lock->link);
		if (lock->ops_put)
			lock->ops->put(lock);
	}
}

/* Backoff example */
static inline int __must_check trans_backoff(struct list_head *list,
bool intr,
					     struct trans_lockitem
*contending)
{
	int ret = 0;

	trans_unlock_put_all(list);
	if (contending) {
		ret = contending->ops->slow_lock(contending, intr);
		if (!ret && contending->relax_unlock)
			contending->unlock(contending);
		if (ret == -EAGAIN)
			ret = 0;
		contending->ops->put(contending);
	}

	return ret;
}
		
		
#endif _TRANSACTION_LOCKLIST_


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Cross-driver ww transaction lock lists
  2021-04-13  7:50 ` [Intel-gfx] " Thomas Hellström
@ 2021-04-13  7:57   ` Christian König
  -1 siblings, 0 replies; 15+ messages in thread
From: Christian König @ 2021-04-13  7:57 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, linaro-mm-sig, Matthew Auld



Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> Hi!
>
> During the dma_resv conversion of the i915 driver, we've been using ww
> transaction lock lists to keep track of ww_mutexes that are locked
> during the transaction so that they can be batch unlocked at suitable
> locations. Including also the LMEM/VRAM eviction we've ended up with
> two static lists per transaction context; one typically unlocked at the
> end of transaction and one initialized before and unlocked after each
> buffer object validate. This enables us to do sleeping locking at
> eviction and keep objects locked on the eviction list until we
> eventually succeed allocating memory (modulo minor flaws of course).
>
> It would be beneficial with the i915 TTM conversion to be able to
> introduce a similar functionality that would work in ttm but also
> cross-driver in, for example move_notify. It would also be beneficial
> to be able to put any dma_resv ww mutex on the lists, and not require
> it to be embedded in a particular object type.
>
> I started scetching on some utilities for this. For TTM, for example,
> the idea would be to pass a list head for the ww transaction lock list
> in the ttm_operation_ctx. A function taking a ww_mutex could then
> either attach a grabbed lock to the list for batch unlocking, or be
> responsible for unlocking when the lock's scope is exited. It's also
> possible to create sublists if so desired. I believe the below would be
> sufficient to cover the i915 functionality.
>
> Any comments and suggestions appreciated!

ah yes Daniel and I haven been discussing something like this for years.

I also came up with rough implementation, but we always ran into 
lifetime issues.

In other words the ww_mutexes which are on the list would need to be 
kept alive until unlocked.

Because of this we kind of backed up and said we would need this on the 
GEM level instead of working with dma_resv objects.

Regards,
Christian.

>
> 8<------------------------------------------------------
>
> #ifndef _TRANSACTION_LOCKLIST_H_
> #define _TRANSACTION_LOCKLIST_H_
>
> struct trans_lockitem;
>
> /**
>   * struct trans_locklist_ops - Ops structure for the ww locklist
> functionality.
>   *
>   * Typically a const struct trans_locklist_ops is defined for each type
> that
>   * embeds a struct trans_lockitem, or hav a struct trans_lockitem
> pointing
>   * at it using the private pointer. It can be a buffer object,
> reservation
>   * object, a single ww_mutex or even a sublist of trans_lockitems.
>   */
> struct trans_locklist_ops {
> 	/**
> 	 * slow_lock: Slow lock to relax the transaction. Only used by
> 	 * a contending lock item.
> 	 * @item: The struct trans_lockitem to lock
> 	 * @intr: Whether to lock interruptible
> 	 *
> 	 * Return: -ERESTARTSYS: Hit a signal when relaxing,
> 	 * -EAGAIN, relaxing succesful, but the contending lock
> 	 * remains unlocked.
> 	 */
> 	int (*slow_lock) (struct trans_lockitem *item, bool intr);
>
> 	/**
> 	 * unlock: Unlock.
> 	 * @item: The struct trans_lockitem to unlock.
> 	 */
> 	void (*unlock) (struct trans_lockitem *item);
>
> 	/**
> 	 * unlock: Unlock.
> 	 * @item: The struct trans_lockitem to put. This function may
> be NULL.
> 	 */
> 	void (*put) (struct trans_lockitem *item);
> };
>
> /**
>   * struct trans_lockitem
>   * @ops: Pointer to an Ops structure for this lockitem.
>   * @link: List link for the transaction locklist.
>   * @private: Private info.
>   * @relax_unlock: Unlock contending lock after relaxation since it is
>   * likely not needed after a transaction restart.
>   *
>   * A struct trans_lockitem typically represents a single lock, but is
>   * generic enough to represent a sublist of locks. It can either be
>   * embedded, or allocated on demand. A kmem_cache might be beneficial.
>   */
> struct trans_lockitem {
> 	const struct trans_locklist_ops *ops;
> 	struct list_head link;
> 	u32 relax_unlock:1;
> 	void *private;
> };
>
> /* unlock example */
> static inline void trans_unlock_put_all(struct list_head *list)
> {
> 	struct trans_lockitem *lock, *next;
>
> 	list_for_each_entry_safe(lock, next, typeof(*lock), link) {
> 		lock->ops->unlock(lock);
> 		list_del_init(&lock->link);
> 		if (lock->ops_put)
> 			lock->ops->put(lock);
> 	}
> }
>
> /* Backoff example */
> static inline int __must_check trans_backoff(struct list_head *list,
> bool intr,
> 					     struct trans_lockitem
> *contending)
> {
> 	int ret = 0;
>
> 	trans_unlock_put_all(list);
> 	if (contending) {
> 		ret = contending->ops->slow_lock(contending, intr);
> 		if (!ret && contending->relax_unlock)
> 			contending->unlock(contending);
> 		if (ret == -EAGAIN)
> 			ret = 0;
> 		contending->ops->put(contending);
> 	}
>
> 	return ret;
> }
> 		
> 		
> #endif _TRANSACTION_LOCKLIST_
>
>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [RFC] Cross-driver ww transaction lock lists
@ 2021-04-13  7:57   ` Christian König
  0 siblings, 0 replies; 15+ messages in thread
From: Christian König @ 2021-04-13  7:57 UTC (permalink / raw)
  To: Thomas Hellström, dri-devel; +Cc: intel-gfx, linaro-mm-sig, Matthew Auld



Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> Hi!
>
> During the dma_resv conversion of the i915 driver, we've been using ww
> transaction lock lists to keep track of ww_mutexes that are locked
> during the transaction so that they can be batch unlocked at suitable
> locations. Including also the LMEM/VRAM eviction we've ended up with
> two static lists per transaction context; one typically unlocked at the
> end of transaction and one initialized before and unlocked after each
> buffer object validate. This enables us to do sleeping locking at
> eviction and keep objects locked on the eviction list until we
> eventually succeed allocating memory (modulo minor flaws of course).
>
> It would be beneficial with the i915 TTM conversion to be able to
> introduce a similar functionality that would work in ttm but also
> cross-driver in, for example move_notify. It would also be beneficial
> to be able to put any dma_resv ww mutex on the lists, and not require
> it to be embedded in a particular object type.
>
> I started scetching on some utilities for this. For TTM, for example,
> the idea would be to pass a list head for the ww transaction lock list
> in the ttm_operation_ctx. A function taking a ww_mutex could then
> either attach a grabbed lock to the list for batch unlocking, or be
> responsible for unlocking when the lock's scope is exited. It's also
> possible to create sublists if so desired. I believe the below would be
> sufficient to cover the i915 functionality.
>
> Any comments and suggestions appreciated!

ah yes Daniel and I haven been discussing something like this for years.

I also came up with rough implementation, but we always ran into 
lifetime issues.

In other words the ww_mutexes which are on the list would need to be 
kept alive until unlocked.

Because of this we kind of backed up and said we would need this on the 
GEM level instead of working with dma_resv objects.

Regards,
Christian.

>
> 8<------------------------------------------------------
>
> #ifndef _TRANSACTION_LOCKLIST_H_
> #define _TRANSACTION_LOCKLIST_H_
>
> struct trans_lockitem;
>
> /**
>   * struct trans_locklist_ops - Ops structure for the ww locklist
> functionality.
>   *
>   * Typically a const struct trans_locklist_ops is defined for each type
> that
>   * embeds a struct trans_lockitem, or hav a struct trans_lockitem
> pointing
>   * at it using the private pointer. It can be a buffer object,
> reservation
>   * object, a single ww_mutex or even a sublist of trans_lockitems.
>   */
> struct trans_locklist_ops {
> 	/**
> 	 * slow_lock: Slow lock to relax the transaction. Only used by
> 	 * a contending lock item.
> 	 * @item: The struct trans_lockitem to lock
> 	 * @intr: Whether to lock interruptible
> 	 *
> 	 * Return: -ERESTARTSYS: Hit a signal when relaxing,
> 	 * -EAGAIN, relaxing succesful, but the contending lock
> 	 * remains unlocked.
> 	 */
> 	int (*slow_lock) (struct trans_lockitem *item, bool intr);
>
> 	/**
> 	 * unlock: Unlock.
> 	 * @item: The struct trans_lockitem to unlock.
> 	 */
> 	void (*unlock) (struct trans_lockitem *item);
>
> 	/**
> 	 * unlock: Unlock.
> 	 * @item: The struct trans_lockitem to put. This function may
> be NULL.
> 	 */
> 	void (*put) (struct trans_lockitem *item);
> };
>
> /**
>   * struct trans_lockitem
>   * @ops: Pointer to an Ops structure for this lockitem.
>   * @link: List link for the transaction locklist.
>   * @private: Private info.
>   * @relax_unlock: Unlock contending lock after relaxation since it is
>   * likely not needed after a transaction restart.
>   *
>   * A struct trans_lockitem typically represents a single lock, but is
>   * generic enough to represent a sublist of locks. It can either be
>   * embedded, or allocated on demand. A kmem_cache might be beneficial.
>   */
> struct trans_lockitem {
> 	const struct trans_locklist_ops *ops;
> 	struct list_head link;
> 	u32 relax_unlock:1;
> 	void *private;
> };
>
> /* unlock example */
> static inline void trans_unlock_put_all(struct list_head *list)
> {
> 	struct trans_lockitem *lock, *next;
>
> 	list_for_each_entry_safe(lock, next, typeof(*lock), link) {
> 		lock->ops->unlock(lock);
> 		list_del_init(&lock->link);
> 		if (lock->ops_put)
> 			lock->ops->put(lock);
> 	}
> }
>
> /* Backoff example */
> static inline int __must_check trans_backoff(struct list_head *list,
> bool intr,
> 					     struct trans_lockitem
> *contending)
> {
> 	int ret = 0;
>
> 	trans_unlock_put_all(list);
> 	if (contending) {
> 		ret = contending->ops->slow_lock(contending, intr);
> 		if (!ret && contending->relax_unlock)
> 			contending->unlock(contending);
> 		if (ret == -EAGAIN)
> 			ret = 0;
> 		contending->ops->put(contending);
> 	}
>
> 	return ret;
> }
> 		
> 		
> #endif _TRANSACTION_LOCKLIST_
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [RFC] Cross-driver ww transaction lock lists
  2021-04-13  7:57   ` [Intel-gfx] " Christian König
  (?)
@ 2021-04-13  8:07   ` Thomas Hellström (Intel)
  -1 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström (Intel) @ 2021-04-13  8:07 UTC (permalink / raw)
  To: intel-gfx

Hi,

On 4/13/21 9:57 AM, Christian König wrote:
>
>
> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
>> Hi!
>>
>> During the dma_resv conversion of the i915 driver, we've been using ww
>> transaction lock lists to keep track of ww_mutexes that are locked
>> during the transaction so that they can be batch unlocked at suitable
>> locations. Including also the LMEM/VRAM eviction we've ended up with
>> two static lists per transaction context; one typically unlocked at the
>> end of transaction and one initialized before and unlocked after each
>> buffer object validate. This enables us to do sleeping locking at
>> eviction and keep objects locked on the eviction list until we
>> eventually succeed allocating memory (modulo minor flaws of course).
>>
>> It would be beneficial with the i915 TTM conversion to be able to
>> introduce a similar functionality that would work in ttm but also
>> cross-driver in, for example move_notify. It would also be beneficial
>> to be able to put any dma_resv ww mutex on the lists, and not require
>> it to be embedded in a particular object type.
>>
>> I started scetching on some utilities for this. For TTM, for example,
>> the idea would be to pass a list head for the ww transaction lock list
>> in the ttm_operation_ctx. A function taking a ww_mutex could then
>> either attach a grabbed lock to the list for batch unlocking, or be
>> responsible for unlocking when the lock's scope is exited. It's also
>> possible to create sublists if so desired. I believe the below would be
>> sufficient to cover the i915 functionality.
>>
>> Any comments and suggestions appreciated!
>
> ah yes Daniel and I haven been discussing something like this for years.
>
> I also came up with rough implementation, but we always ran into 
> lifetime issues.
>
> In other words the ww_mutexes which are on the list would need to be 
> kept alive until unlocked.

Yes, the idea here is that the locker takes whatever reference needed to 
put the lock on the list and keep it alive, and if that's not possible 
for some reason, don't put the object on the list, but in the latter 
case we need to solve the contended case. If needed I have a ww_mutex 
api addition that does that...

>
> Because of this we kind of backed up and said we would need this on 
> the GEM level instead of working with dma_resv objects.

And that's what the ops are for. The implementation doesn't care about 
the underlying object. For TTM, we'd probably want to take a refcount on 
the embedded GEM object.

/Thomas


>
> Regards,
> Christian.
>
>>
>> 8<------------------------------------------------------
>>
>> #ifndef _TRANSACTION_LOCKLIST_H_
>> #define _TRANSACTION_LOCKLIST_H_
>>
>> struct trans_lockitem;
>>
>> /**
>>   * struct trans_locklist_ops - Ops structure for the ww locklist
>> functionality.
>>   *
>>   * Typically a const struct trans_locklist_ops is defined for each type
>> that
>>   * embeds a struct trans_lockitem, or hav a struct trans_lockitem
>> pointing
>>   * at it using the private pointer. It can be a buffer object,
>> reservation
>>   * object, a single ww_mutex or even a sublist of trans_lockitems.
>>   */
>> struct trans_locklist_ops {
>>     /**
>>      * slow_lock: Slow lock to relax the transaction. Only used by
>>      * a contending lock item.
>>      * @item: The struct trans_lockitem to lock
>>      * @intr: Whether to lock interruptible
>>      *
>>      * Return: -ERESTARTSYS: Hit a signal when relaxing,
>>      * -EAGAIN, relaxing succesful, but the contending lock
>>      * remains unlocked.
>>      */
>>     int (*slow_lock) (struct trans_lockitem *item, bool intr);
>>
>>     /**
>>      * unlock: Unlock.
>>      * @item: The struct trans_lockitem to unlock.
>>      */
>>     void (*unlock) (struct trans_lockitem *item);
>>
>>     /**
>>      * unlock: Unlock.
>>      * @item: The struct trans_lockitem to put. This function may
>> be NULL.
>>      */
>>     void (*put) (struct trans_lockitem *item);
>> };
>>
>> /**
>>   * struct trans_lockitem
>>   * @ops: Pointer to an Ops structure for this lockitem.
>>   * @link: List link for the transaction locklist.
>>   * @private: Private info.
>>   * @relax_unlock: Unlock contending lock after relaxation since it is
>>   * likely not needed after a transaction restart.
>>   *
>>   * A struct trans_lockitem typically represents a single lock, but is
>>   * generic enough to represent a sublist of locks. It can either be
>>   * embedded, or allocated on demand. A kmem_cache might be beneficial.
>>   */
>> struct trans_lockitem {
>>     const struct trans_locklist_ops *ops;
>>     struct list_head link;
>>     u32 relax_unlock:1;
>>     void *private;
>> };
>>
>> /* unlock example */
>> static inline void trans_unlock_put_all(struct list_head *list)
>> {
>>     struct trans_lockitem *lock, *next;
>>
>>     list_for_each_entry_safe(lock, next, typeof(*lock), link) {
>>         lock->ops->unlock(lock);
>>         list_del_init(&lock->link);
>>         if (lock->ops_put)
>>             lock->ops->put(lock);
>>     }
>> }
>>
>> /* Backoff example */
>> static inline int __must_check trans_backoff(struct list_head *list,
>> bool intr,
>>                          struct trans_lockitem
>> *contending)
>> {
>>     int ret = 0;
>>
>>     trans_unlock_put_all(list);
>>     if (contending) {
>>         ret = contending->ops->slow_lock(contending, intr);
>>         if (!ret && contending->relax_unlock)
>>             contending->unlock(contending);
>>         if (ret == -EAGAIN)
>>             ret = 0;
>>         contending->ops->put(contending);
>>     }
>>
>>     return ret;
>> }
>>
>>
>> #endif _TRANSACTION_LOCKLIST_
>>
>>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC] Cross-driver ww transaction lock lists
  2021-04-13  7:57   ` [Intel-gfx] " Christian König
@ 2021-04-15 13:37     ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2021-04-15 13:37 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellström, intel-gfx, dri-devel, linaro-mm-sig, Matthew Auld

On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
> 
> 
> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> > Hi!
> > 
> > During the dma_resv conversion of the i915 driver, we've been using ww
> > transaction lock lists to keep track of ww_mutexes that are locked
> > during the transaction so that they can be batch unlocked at suitable
> > locations. Including also the LMEM/VRAM eviction we've ended up with
> > two static lists per transaction context; one typically unlocked at the
> > end of transaction and one initialized before and unlocked after each
> > buffer object validate. This enables us to do sleeping locking at
> > eviction and keep objects locked on the eviction list until we
> > eventually succeed allocating memory (modulo minor flaws of course).
> > 
> > It would be beneficial with the i915 TTM conversion to be able to
> > introduce a similar functionality that would work in ttm but also
> > cross-driver in, for example move_notify. It would also be beneficial
> > to be able to put any dma_resv ww mutex on the lists, and not require
> > it to be embedded in a particular object type.
> > 
> > I started scetching on some utilities for this. For TTM, for example,
> > the idea would be to pass a list head for the ww transaction lock list
> > in the ttm_operation_ctx. A function taking a ww_mutex could then
> > either attach a grabbed lock to the list for batch unlocking, or be
> > responsible for unlocking when the lock's scope is exited. It's also
> > possible to create sublists if so desired. I believe the below would be
> > sufficient to cover the i915 functionality.
> > 
> > Any comments and suggestions appreciated!
> 
> ah yes Daniel and I haven been discussing something like this for years.
> 
> I also came up with rough implementation, but we always ran into lifetime
> issues.
> 
> In other words the ww_mutexes which are on the list would need to be kept
> alive until unlocked.
> 
> Because of this we kind of backed up and said we would need this on the GEM
> level instead of working with dma_resv objects.

Yeah there's a few funny concerns here that make this awkward:
- For simplicity doing these helpers at the gem level should make things a
  bit easier, because then we have a standard way to drop the reference.
  It does mean that the only thing you can lock like this are gem objects,
  but I think that's fine. At least for a first cut.

- This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
  into adopting gem bo internally to be able to drop a pile of code and
  stop making vmwgfx the only special-case we have b) drivers which don't
  need this won't need this, so should be fine.

  The other awkward thing I guess is that ttm would need to use the
  embedded kref from the gem bo, but that should be transparent I think.

- Next up is dma-buf: For i915 we'd like to do the same eviction trick
  also through p2p dma-buf callbacks, so that this works the same as
  eviction/reservation within a gpu. But for these internal bo you might
  not have a dma-buf, so we can't just lift the trick to the dma-buf
  level. But I think if we pass e.g. a struct list_head and a callback to
  unreference/unlock all the buffers in there to the exporter, plus
  similar for the slowpath lock, then that should be doable without
  glorious layering inversions between dma-buf and gem.

  I think for dma-buf it should even be ok if this requires that we
  allocate an entire structure with kmalloc or something, allocating
  memory while holding dma_resv is ok.

- Another reason for doing it at the gem level is that the SoC drivers
  should probably use dma_resv more, so having some competent cs/eviction
  helpers derived from what we have in ttm would be nice I think.

But also I never got anywhere with anything, since like Christian said if
we just link up ww_mutex, or dma_resv, it always dies on some lifetime
handling issues.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > 8<------------------------------------------------------
> > 
> > #ifndef _TRANSACTION_LOCKLIST_H_
> > #define _TRANSACTION_LOCKLIST_H_
> > 
> > struct trans_lockitem;
> > 
> > /**
> >   * struct trans_locklist_ops - Ops structure for the ww locklist
> > functionality.
> >   *
> >   * Typically a const struct trans_locklist_ops is defined for each type
> > that
> >   * embeds a struct trans_lockitem, or hav a struct trans_lockitem
> > pointing
> >   * at it using the private pointer. It can be a buffer object,
> > reservation
> >   * object, a single ww_mutex or even a sublist of trans_lockitems.
> >   */
> > struct trans_locklist_ops {
> > 	/**
> > 	 * slow_lock: Slow lock to relax the transaction. Only used by
> > 	 * a contending lock item.
> > 	 * @item: The struct trans_lockitem to lock
> > 	 * @intr: Whether to lock interruptible
> > 	 *
> > 	 * Return: -ERESTARTSYS: Hit a signal when relaxing,
> > 	 * -EAGAIN, relaxing succesful, but the contending lock
> > 	 * remains unlocked.
> > 	 */
> > 	int (*slow_lock) (struct trans_lockitem *item, bool intr);
> > 
> > 	/**
> > 	 * unlock: Unlock.
> > 	 * @item: The struct trans_lockitem to unlock.
> > 	 */
> > 	void (*unlock) (struct trans_lockitem *item);
> > 
> > 	/**
> > 	 * unlock: Unlock.
> > 	 * @item: The struct trans_lockitem to put. This function may
> > be NULL.
> > 	 */
> > 	void (*put) (struct trans_lockitem *item);
> > };
> > 
> > /**
> >   * struct trans_lockitem
> >   * @ops: Pointer to an Ops structure for this lockitem.
> >   * @link: List link for the transaction locklist.
> >   * @private: Private info.
> >   * @relax_unlock: Unlock contending lock after relaxation since it is
> >   * likely not needed after a transaction restart.
> >   *
> >   * A struct trans_lockitem typically represents a single lock, but is
> >   * generic enough to represent a sublist of locks. It can either be
> >   * embedded, or allocated on demand. A kmem_cache might be beneficial.
> >   */
> > struct trans_lockitem {
> > 	const struct trans_locklist_ops *ops;
> > 	struct list_head link;
> > 	u32 relax_unlock:1;
> > 	void *private;
> > };
> > 
> > /* unlock example */
> > static inline void trans_unlock_put_all(struct list_head *list)
> > {
> > 	struct trans_lockitem *lock, *next;
> > 
> > 	list_for_each_entry_safe(lock, next, typeof(*lock), link) {
> > 		lock->ops->unlock(lock);
> > 		list_del_init(&lock->link);
> > 		if (lock->ops_put)
> > 			lock->ops->put(lock);
> > 	}
> > }
> > 
> > /* Backoff example */
> > static inline int __must_check trans_backoff(struct list_head *list,
> > bool intr,
> > 					     struct trans_lockitem
> > *contending)
> > {
> > 	int ret = 0;
> > 
> > 	trans_unlock_put_all(list);
> > 	if (contending) {
> > 		ret = contending->ops->slow_lock(contending, intr);
> > 		if (!ret && contending->relax_unlock)
> > 			contending->unlock(contending);
> > 		if (ret == -EAGAIN)
> > 			ret = 0;
> > 		contending->ops->put(contending);
> > 	}
> > 
> > 	return ret;
> > }
> > 		
> > 		
> > #endif _TRANSACTION_LOCKLIST_
> > 
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [RFC] Cross-driver ww transaction lock lists
@ 2021-04-15 13:37     ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2021-04-15 13:37 UTC (permalink / raw)
  To: Christian König
  Cc: Thomas Hellström, intel-gfx, dri-devel, linaro-mm-sig, Matthew Auld

On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
> 
> 
> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> > Hi!
> > 
> > During the dma_resv conversion of the i915 driver, we've been using ww
> > transaction lock lists to keep track of ww_mutexes that are locked
> > during the transaction so that they can be batch unlocked at suitable
> > locations. Including also the LMEM/VRAM eviction we've ended up with
> > two static lists per transaction context; one typically unlocked at the
> > end of transaction and one initialized before and unlocked after each
> > buffer object validate. This enables us to do sleeping locking at
> > eviction and keep objects locked on the eviction list until we
> > eventually succeed allocating memory (modulo minor flaws of course).
> > 
> > It would be beneficial with the i915 TTM conversion to be able to
> > introduce a similar functionality that would work in ttm but also
> > cross-driver in, for example move_notify. It would also be beneficial
> > to be able to put any dma_resv ww mutex on the lists, and not require
> > it to be embedded in a particular object type.
> > 
> > I started scetching on some utilities for this. For TTM, for example,
> > the idea would be to pass a list head for the ww transaction lock list
> > in the ttm_operation_ctx. A function taking a ww_mutex could then
> > either attach a grabbed lock to the list for batch unlocking, or be
> > responsible for unlocking when the lock's scope is exited. It's also
> > possible to create sublists if so desired. I believe the below would be
> > sufficient to cover the i915 functionality.
> > 
> > Any comments and suggestions appreciated!
> 
> ah yes Daniel and I haven been discussing something like this for years.
> 
> I also came up with rough implementation, but we always ran into lifetime
> issues.
> 
> In other words the ww_mutexes which are on the list would need to be kept
> alive until unlocked.
> 
> Because of this we kind of backed up and said we would need this on the GEM
> level instead of working with dma_resv objects.

Yeah there's a few funny concerns here that make this awkward:
- For simplicity doing these helpers at the gem level should make things a
  bit easier, because then we have a standard way to drop the reference.
  It does mean that the only thing you can lock like this are gem objects,
  but I think that's fine. At least for a first cut.

- This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
  into adopting gem bo internally to be able to drop a pile of code and
  stop making vmwgfx the only special-case we have b) drivers which don't
  need this won't need this, so should be fine.

  The other awkward thing I guess is that ttm would need to use the
  embedded kref from the gem bo, but that should be transparent I think.

- Next up is dma-buf: For i915 we'd like to do the same eviction trick
  also through p2p dma-buf callbacks, so that this works the same as
  eviction/reservation within a gpu. But for these internal bo you might
  not have a dma-buf, so we can't just lift the trick to the dma-buf
  level. But I think if we pass e.g. a struct list_head and a callback to
  unreference/unlock all the buffers in there to the exporter, plus
  similar for the slowpath lock, then that should be doable without
  glorious layering inversions between dma-buf and gem.

  I think for dma-buf it should even be ok if this requires that we
  allocate an entire structure with kmalloc or something, allocating
  memory while holding dma_resv is ok.

- Another reason for doing it at the gem level is that the SoC drivers
  should probably use dma_resv more, so having some competent cs/eviction
  helpers derived from what we have in ttm would be nice I think.

But also I never got anywhere with anything, since like Christian said if
we just link up ww_mutex, or dma_resv, it always dies on some lifetime
handling issues.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > 8<------------------------------------------------------
> > 
> > #ifndef _TRANSACTION_LOCKLIST_H_
> > #define _TRANSACTION_LOCKLIST_H_
> > 
> > struct trans_lockitem;
> > 
> > /**
> >   * struct trans_locklist_ops - Ops structure for the ww locklist
> > functionality.
> >   *
> >   * Typically a const struct trans_locklist_ops is defined for each type
> > that
> >   * embeds a struct trans_lockitem, or hav a struct trans_lockitem
> > pointing
> >   * at it using the private pointer. It can be a buffer object,
> > reservation
> >   * object, a single ww_mutex or even a sublist of trans_lockitems.
> >   */
> > struct trans_locklist_ops {
> > 	/**
> > 	 * slow_lock: Slow lock to relax the transaction. Only used by
> > 	 * a contending lock item.
> > 	 * @item: The struct trans_lockitem to lock
> > 	 * @intr: Whether to lock interruptible
> > 	 *
> > 	 * Return: -ERESTARTSYS: Hit a signal when relaxing,
> > 	 * -EAGAIN, relaxing succesful, but the contending lock
> > 	 * remains unlocked.
> > 	 */
> > 	int (*slow_lock) (struct trans_lockitem *item, bool intr);
> > 
> > 	/**
> > 	 * unlock: Unlock.
> > 	 * @item: The struct trans_lockitem to unlock.
> > 	 */
> > 	void (*unlock) (struct trans_lockitem *item);
> > 
> > 	/**
> > 	 * unlock: Unlock.
> > 	 * @item: The struct trans_lockitem to put. This function may
> > be NULL.
> > 	 */
> > 	void (*put) (struct trans_lockitem *item);
> > };
> > 
> > /**
> >   * struct trans_lockitem
> >   * @ops: Pointer to an Ops structure for this lockitem.
> >   * @link: List link for the transaction locklist.
> >   * @private: Private info.
> >   * @relax_unlock: Unlock contending lock after relaxation since it is
> >   * likely not needed after a transaction restart.
> >   *
> >   * A struct trans_lockitem typically represents a single lock, but is
> >   * generic enough to represent a sublist of locks. It can either be
> >   * embedded, or allocated on demand. A kmem_cache might be beneficial.
> >   */
> > struct trans_lockitem {
> > 	const struct trans_locklist_ops *ops;
> > 	struct list_head link;
> > 	u32 relax_unlock:1;
> > 	void *private;
> > };
> > 
> > /* unlock example */
> > static inline void trans_unlock_put_all(struct list_head *list)
> > {
> > 	struct trans_lockitem *lock, *next;
> > 
> > 	list_for_each_entry_safe(lock, next, typeof(*lock), link) {
> > 		lock->ops->unlock(lock);
> > 		list_del_init(&lock->link);
> > 		if (lock->ops_put)
> > 			lock->ops->put(lock);
> > 	}
> > }
> > 
> > /* Backoff example */
> > static inline int __must_check trans_backoff(struct list_head *list,
> > bool intr,
> > 					     struct trans_lockitem
> > *contending)
> > {
> > 	int ret = 0;
> > 
> > 	trans_unlock_put_all(list);
> > 	if (contending) {
> > 		ret = contending->ops->slow_lock(contending, intr);
> > 		if (!ret && contending->relax_unlock)
> > 			contending->unlock(contending);
> > 		if (ret == -EAGAIN)
> > 			ret = 0;
> > 		contending->ops->put(contending);
> > 	}
> > 
> > 	return ret;
> > }
> > 		
> > 		
> > #endif _TRANSACTION_LOCKLIST_
> > 
> > 
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
  2021-04-15 13:37     ` [Intel-gfx] " Daniel Vetter
@ 2021-04-15 14:02       ` Thomas Hellström (Intel)
  -1 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström (Intel) @ 2021-04-15 14:02 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Thomas Hellström, intel-gfx, dri-devel, linaro-mm-sig, Matthew Auld


On 4/15/21 3:37 PM, Daniel Vetter wrote:
> On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
>>
>> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
>>> Hi!
>>>
>>> During the dma_resv conversion of the i915 driver, we've been using ww
>>> transaction lock lists to keep track of ww_mutexes that are locked
>>> during the transaction so that they can be batch unlocked at suitable
>>> locations. Including also the LMEM/VRAM eviction we've ended up with
>>> two static lists per transaction context; one typically unlocked at the
>>> end of transaction and one initialized before and unlocked after each
>>> buffer object validate. This enables us to do sleeping locking at
>>> eviction and keep objects locked on the eviction list until we
>>> eventually succeed allocating memory (modulo minor flaws of course).
>>>
>>> It would be beneficial with the i915 TTM conversion to be able to
>>> introduce a similar functionality that would work in ttm but also
>>> cross-driver in, for example move_notify. It would also be beneficial
>>> to be able to put any dma_resv ww mutex on the lists, and not require
>>> it to be embedded in a particular object type.
>>>
>>> I started scetching on some utilities for this. For TTM, for example,
>>> the idea would be to pass a list head for the ww transaction lock list
>>> in the ttm_operation_ctx. A function taking a ww_mutex could then
>>> either attach a grabbed lock to the list for batch unlocking, or be
>>> responsible for unlocking when the lock's scope is exited. It's also
>>> possible to create sublists if so desired. I believe the below would be
>>> sufficient to cover the i915 functionality.
>>>
>>> Any comments and suggestions appreciated!
>> ah yes Daniel and I haven been discussing something like this for years.
>>
>> I also came up with rough implementation, but we always ran into lifetime
>> issues.
>>
>> In other words the ww_mutexes which are on the list would need to be kept
>> alive until unlocked.
>>
>> Because of this we kind of backed up and said we would need this on the GEM
>> level instead of working with dma_resv objects.
> Yeah there's a few funny concerns here that make this awkward:
> - For simplicity doing these helpers at the gem level should make things a
>    bit easier, because then we have a standard way to drop the reference.
>    It does mean that the only thing you can lock like this are gem objects,
>    but I think that's fine. At least for a first cut.
>
> - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
>    into adopting gem bo internally to be able to drop a pile of code and
>    stop making vmwgfx the only special-case we have b) drivers which don't
>    need this won't need this, so should be fine.
>
>    The other awkward thing I guess is that ttm would need to use the
>    embedded kref from the gem bo, but that should be transparent I think.
>
> - Next up is dma-buf: For i915 we'd like to do the same eviction trick
>    also through p2p dma-buf callbacks, so that this works the same as
>    eviction/reservation within a gpu. But for these internal bo you might
>    not have a dma-buf, so we can't just lift the trick to the dma-buf
>    level. But I think if we pass e.g. a struct list_head and a callback to
>    unreference/unlock all the buffers in there to the exporter, plus
>    similar for the slowpath lock, then that should be doable without
>    glorious layering inversions between dma-buf and gem.
>
>    I think for dma-buf it should even be ok if this requires that we
>    allocate an entire structure with kmalloc or something, allocating
>    memory while holding dma_resv is ok.

Yes, the thing here with the suggested helpers is that you would just 
embed a trans_lockitem struct in the gem object (and defines the gem 
object ops). Otherwise and for passing to dma-buf this is pretty much 
exactly what you are suggesting, but the huge benefit of encapsulating 
the needed members like this is that when we need to change something we 
change it in just one place.

For anything that doesn't have a gem object (dma-buf, vmwgfx or 
whatever) you have the choice of either allocating a struct 
trans_lockitem or embed it wherever you prefer. In particular, this is 
beneficial where you have a single dma-resv class ww-mutex sitting 
somewhere in the way and you don't want to needlessly have a gem object 
that embeds it.

/Thomas


_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
@ 2021-04-15 14:02       ` Thomas Hellström (Intel)
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström (Intel) @ 2021-04-15 14:02 UTC (permalink / raw)
  To: Daniel Vetter, Christian König
  Cc: Thomas Hellström, intel-gfx, dri-devel, linaro-mm-sig, Matthew Auld


On 4/15/21 3:37 PM, Daniel Vetter wrote:
> On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
>>
>> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
>>> Hi!
>>>
>>> During the dma_resv conversion of the i915 driver, we've been using ww
>>> transaction lock lists to keep track of ww_mutexes that are locked
>>> during the transaction so that they can be batch unlocked at suitable
>>> locations. Including also the LMEM/VRAM eviction we've ended up with
>>> two static lists per transaction context; one typically unlocked at the
>>> end of transaction and one initialized before and unlocked after each
>>> buffer object validate. This enables us to do sleeping locking at
>>> eviction and keep objects locked on the eviction list until we
>>> eventually succeed allocating memory (modulo minor flaws of course).
>>>
>>> It would be beneficial with the i915 TTM conversion to be able to
>>> introduce a similar functionality that would work in ttm but also
>>> cross-driver in, for example move_notify. It would also be beneficial
>>> to be able to put any dma_resv ww mutex on the lists, and not require
>>> it to be embedded in a particular object type.
>>>
>>> I started scetching on some utilities for this. For TTM, for example,
>>> the idea would be to pass a list head for the ww transaction lock list
>>> in the ttm_operation_ctx. A function taking a ww_mutex could then
>>> either attach a grabbed lock to the list for batch unlocking, or be
>>> responsible for unlocking when the lock's scope is exited. It's also
>>> possible to create sublists if so desired. I believe the below would be
>>> sufficient to cover the i915 functionality.
>>>
>>> Any comments and suggestions appreciated!
>> ah yes Daniel and I haven been discussing something like this for years.
>>
>> I also came up with rough implementation, but we always ran into lifetime
>> issues.
>>
>> In other words the ww_mutexes which are on the list would need to be kept
>> alive until unlocked.
>>
>> Because of this we kind of backed up and said we would need this on the GEM
>> level instead of working with dma_resv objects.
> Yeah there's a few funny concerns here that make this awkward:
> - For simplicity doing these helpers at the gem level should make things a
>    bit easier, because then we have a standard way to drop the reference.
>    It does mean that the only thing you can lock like this are gem objects,
>    but I think that's fine. At least for a first cut.
>
> - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
>    into adopting gem bo internally to be able to drop a pile of code and
>    stop making vmwgfx the only special-case we have b) drivers which don't
>    need this won't need this, so should be fine.
>
>    The other awkward thing I guess is that ttm would need to use the
>    embedded kref from the gem bo, but that should be transparent I think.
>
> - Next up is dma-buf: For i915 we'd like to do the same eviction trick
>    also through p2p dma-buf callbacks, so that this works the same as
>    eviction/reservation within a gpu. But for these internal bo you might
>    not have a dma-buf, so we can't just lift the trick to the dma-buf
>    level. But I think if we pass e.g. a struct list_head and a callback to
>    unreference/unlock all the buffers in there to the exporter, plus
>    similar for the slowpath lock, then that should be doable without
>    glorious layering inversions between dma-buf and gem.
>
>    I think for dma-buf it should even be ok if this requires that we
>    allocate an entire structure with kmalloc or something, allocating
>    memory while holding dma_resv is ok.

Yes, the thing here with the suggested helpers is that you would just 
embed a trans_lockitem struct in the gem object (and defines the gem 
object ops). Otherwise and for passing to dma-buf this is pretty much 
exactly what you are suggesting, but the huge benefit of encapsulating 
the needed members like this is that when we need to change something we 
change it in just one place.

For anything that doesn't have a gem object (dma-buf, vmwgfx or 
whatever) you have the choice of either allocating a struct 
trans_lockitem or embed it wherever you prefer. In particular, this is 
beneficial where you have a single dma-resv class ww-mutex sitting 
somewhere in the way and you don't want to needlessly have a gem object 
that embeds it.

/Thomas


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
  2021-04-15 14:02       ` [Intel-gfx] " Thomas Hellström (Intel)
@ 2021-04-15 14:40         ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2021-04-15 14:40 UTC (permalink / raw)
  To: Thomas Hellström (Intel)
  Cc: Thomas Hellström, intel-gfx, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Matthew Auld,
	Christian König

On Thu, Apr 15, 2021 at 4:02 PM Thomas Hellström (Intel)
<thomas_os@shipmail.org> wrote:
>
>
> On 4/15/21 3:37 PM, Daniel Vetter wrote:
> > On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
> >>
> >> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> >>> Hi!
> >>>
> >>> During the dma_resv conversion of the i915 driver, we've been using ww
> >>> transaction lock lists to keep track of ww_mutexes that are locked
> >>> during the transaction so that they can be batch unlocked at suitable
> >>> locations. Including also the LMEM/VRAM eviction we've ended up with
> >>> two static lists per transaction context; one typically unlocked at the
> >>> end of transaction and one initialized before and unlocked after each
> >>> buffer object validate. This enables us to do sleeping locking at
> >>> eviction and keep objects locked on the eviction list until we
> >>> eventually succeed allocating memory (modulo minor flaws of course).
> >>>
> >>> It would be beneficial with the i915 TTM conversion to be able to
> >>> introduce a similar functionality that would work in ttm but also
> >>> cross-driver in, for example move_notify. It would also be beneficial
> >>> to be able to put any dma_resv ww mutex on the lists, and not require
> >>> it to be embedded in a particular object type.
> >>>
> >>> I started scetching on some utilities for this. For TTM, for example,
> >>> the idea would be to pass a list head for the ww transaction lock list
> >>> in the ttm_operation_ctx. A function taking a ww_mutex could then
> >>> either attach a grabbed lock to the list for batch unlocking, or be
> >>> responsible for unlocking when the lock's scope is exited. It's also
> >>> possible to create sublists if so desired. I believe the below would be
> >>> sufficient to cover the i915 functionality.
> >>>
> >>> Any comments and suggestions appreciated!
> >> ah yes Daniel and I haven been discussing something like this for years.
> >>
> >> I also came up with rough implementation, but we always ran into lifetime
> >> issues.
> >>
> >> In other words the ww_mutexes which are on the list would need to be kept
> >> alive until unlocked.
> >>
> >> Because of this we kind of backed up and said we would need this on the GEM
> >> level instead of working with dma_resv objects.
> > Yeah there's a few funny concerns here that make this awkward:
> > - For simplicity doing these helpers at the gem level should make things a
> >    bit easier, because then we have a standard way to drop the reference.
> >    It does mean that the only thing you can lock like this are gem objects,
> >    but I think that's fine. At least for a first cut.
> >
> > - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
> >    into adopting gem bo internally to be able to drop a pile of code and
> >    stop making vmwgfx the only special-case we have b) drivers which don't
> >    need this won't need this, so should be fine.
> >
> >    The other awkward thing I guess is that ttm would need to use the
> >    embedded kref from the gem bo, but that should be transparent I think.
> >
> > - Next up is dma-buf: For i915 we'd like to do the same eviction trick
> >    also through p2p dma-buf callbacks, so that this works the same as
> >    eviction/reservation within a gpu. But for these internal bo you might
> >    not have a dma-buf, so we can't just lift the trick to the dma-buf
> >    level. But I think if we pass e.g. a struct list_head and a callback to
> >    unreference/unlock all the buffers in there to the exporter, plus
> >    similar for the slowpath lock, then that should be doable without
> >    glorious layering inversions between dma-buf and gem.
> >
> >    I think for dma-buf it should even be ok if this requires that we
> >    allocate an entire structure with kmalloc or something, allocating
> >    memory while holding dma_resv is ok.
>
> Yes, the thing here with the suggested helpers is that you would just
> embed a trans_lockitem struct in the gem object (and defines the gem
> object ops). Otherwise and for passing to dma-buf this is pretty much
> exactly what you are suggesting, but the huge benefit of encapsulating
> the needed members like this is that when we need to change something we
> change it in just one place.
>
> For anything that doesn't have a gem object (dma-buf, vmwgfx or
> whatever) you have the choice of either allocating a struct
> trans_lockitem or embed it wherever you prefer. In particular, this is
> beneficial where you have a single dma-resv class ww-mutex sitting
> somewhere in the way and you don't want to needlessly have a gem object
> that embeds it.

The thing is, everyone who actually uses dma_resv_lock has a
gem_buffer_object underneath. So it feels a bit like flexibility for
no real need, and I think it would make it slightly more awkard for
gem drivers to neatly integrate into their cs path. The lockitem
struct works, but it is a bit cumbersome.

Also if we add some wrappers to e.g. add a gem_bo to the ctx, then if
we decide to slip the lockitem in there, we still only need to touch
the helper code, and not all drivers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
@ 2021-04-15 14:40         ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2021-04-15 14:40 UTC (permalink / raw)
  To: Thomas Hellström (Intel)
  Cc: Thomas Hellström, intel-gfx, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Matthew Auld,
	Christian König

On Thu, Apr 15, 2021 at 4:02 PM Thomas Hellström (Intel)
<thomas_os@shipmail.org> wrote:
>
>
> On 4/15/21 3:37 PM, Daniel Vetter wrote:
> > On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
> >>
> >> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> >>> Hi!
> >>>
> >>> During the dma_resv conversion of the i915 driver, we've been using ww
> >>> transaction lock lists to keep track of ww_mutexes that are locked
> >>> during the transaction so that they can be batch unlocked at suitable
> >>> locations. Including also the LMEM/VRAM eviction we've ended up with
> >>> two static lists per transaction context; one typically unlocked at the
> >>> end of transaction and one initialized before and unlocked after each
> >>> buffer object validate. This enables us to do sleeping locking at
> >>> eviction and keep objects locked on the eviction list until we
> >>> eventually succeed allocating memory (modulo minor flaws of course).
> >>>
> >>> It would be beneficial with the i915 TTM conversion to be able to
> >>> introduce a similar functionality that would work in ttm but also
> >>> cross-driver in, for example move_notify. It would also be beneficial
> >>> to be able to put any dma_resv ww mutex on the lists, and not require
> >>> it to be embedded in a particular object type.
> >>>
> >>> I started scetching on some utilities for this. For TTM, for example,
> >>> the idea would be to pass a list head for the ww transaction lock list
> >>> in the ttm_operation_ctx. A function taking a ww_mutex could then
> >>> either attach a grabbed lock to the list for batch unlocking, or be
> >>> responsible for unlocking when the lock's scope is exited. It's also
> >>> possible to create sublists if so desired. I believe the below would be
> >>> sufficient to cover the i915 functionality.
> >>>
> >>> Any comments and suggestions appreciated!
> >> ah yes Daniel and I haven been discussing something like this for years.
> >>
> >> I also came up with rough implementation, but we always ran into lifetime
> >> issues.
> >>
> >> In other words the ww_mutexes which are on the list would need to be kept
> >> alive until unlocked.
> >>
> >> Because of this we kind of backed up and said we would need this on the GEM
> >> level instead of working with dma_resv objects.
> > Yeah there's a few funny concerns here that make this awkward:
> > - For simplicity doing these helpers at the gem level should make things a
> >    bit easier, because then we have a standard way to drop the reference.
> >    It does mean that the only thing you can lock like this are gem objects,
> >    but I think that's fine. At least for a first cut.
> >
> > - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
> >    into adopting gem bo internally to be able to drop a pile of code and
> >    stop making vmwgfx the only special-case we have b) drivers which don't
> >    need this won't need this, so should be fine.
> >
> >    The other awkward thing I guess is that ttm would need to use the
> >    embedded kref from the gem bo, but that should be transparent I think.
> >
> > - Next up is dma-buf: For i915 we'd like to do the same eviction trick
> >    also through p2p dma-buf callbacks, so that this works the same as
> >    eviction/reservation within a gpu. But for these internal bo you might
> >    not have a dma-buf, so we can't just lift the trick to the dma-buf
> >    level. But I think if we pass e.g. a struct list_head and a callback to
> >    unreference/unlock all the buffers in there to the exporter, plus
> >    similar for the slowpath lock, then that should be doable without
> >    glorious layering inversions between dma-buf and gem.
> >
> >    I think for dma-buf it should even be ok if this requires that we
> >    allocate an entire structure with kmalloc or something, allocating
> >    memory while holding dma_resv is ok.
>
> Yes, the thing here with the suggested helpers is that you would just
> embed a trans_lockitem struct in the gem object (and defines the gem
> object ops). Otherwise and for passing to dma-buf this is pretty much
> exactly what you are suggesting, but the huge benefit of encapsulating
> the needed members like this is that when we need to change something we
> change it in just one place.
>
> For anything that doesn't have a gem object (dma-buf, vmwgfx or
> whatever) you have the choice of either allocating a struct
> trans_lockitem or embed it wherever you prefer. In particular, this is
> beneficial where you have a single dma-resv class ww-mutex sitting
> somewhere in the way and you don't want to needlessly have a gem object
> that embeds it.

The thing is, everyone who actually uses dma_resv_lock has a
gem_buffer_object underneath. So it feels a bit like flexibility for
no real need, and I think it would make it slightly more awkard for
gem drivers to neatly integrate into their cs path. The lockitem
struct works, but it is a bit cumbersome.

Also if we add some wrappers to e.g. add a gem_bo to the ctx, then if
we decide to slip the lockitem in there, we still only need to touch
the helper code, and not all drivers.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
  2021-04-15 14:40         ` [Intel-gfx] " Daniel Vetter
@ 2021-04-15 15:40           ` Thomas Hellström (Intel)
  -1 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström (Intel) @ 2021-04-15 15:40 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Thomas Hellström, intel-gfx, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Matthew Auld,
	Christian König


On 4/15/21 4:40 PM, Daniel Vetter wrote:
> On Thu, Apr 15, 2021 at 4:02 PM Thomas Hellström (Intel)
> <thomas_os@shipmail.org> wrote:
>>
>> On 4/15/21 3:37 PM, Daniel Vetter wrote:
>>> On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
>>>> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
>>>>> Hi!
>>>>>
>>>>> During the dma_resv conversion of the i915 driver, we've been using ww
>>>>> transaction lock lists to keep track of ww_mutexes that are locked
>>>>> during the transaction so that they can be batch unlocked at suitable
>>>>> locations. Including also the LMEM/VRAM eviction we've ended up with
>>>>> two static lists per transaction context; one typically unlocked at the
>>>>> end of transaction and one initialized before and unlocked after each
>>>>> buffer object validate. This enables us to do sleeping locking at
>>>>> eviction and keep objects locked on the eviction list until we
>>>>> eventually succeed allocating memory (modulo minor flaws of course).
>>>>>
>>>>> It would be beneficial with the i915 TTM conversion to be able to
>>>>> introduce a similar functionality that would work in ttm but also
>>>>> cross-driver in, for example move_notify. It would also be beneficial
>>>>> to be able to put any dma_resv ww mutex on the lists, and not require
>>>>> it to be embedded in a particular object type.
>>>>>
>>>>> I started scetching on some utilities for this. For TTM, for example,
>>>>> the idea would be to pass a list head for the ww transaction lock list
>>>>> in the ttm_operation_ctx. A function taking a ww_mutex could then
>>>>> either attach a grabbed lock to the list for batch unlocking, or be
>>>>> responsible for unlocking when the lock's scope is exited. It's also
>>>>> possible to create sublists if so desired. I believe the below would be
>>>>> sufficient to cover the i915 functionality.
>>>>>
>>>>> Any comments and suggestions appreciated!
>>>> ah yes Daniel and I haven been discussing something like this for years.
>>>>
>>>> I also came up with rough implementation, but we always ran into lifetime
>>>> issues.
>>>>
>>>> In other words the ww_mutexes which are on the list would need to be kept
>>>> alive until unlocked.
>>>>
>>>> Because of this we kind of backed up and said we would need this on the GEM
>>>> level instead of working with dma_resv objects.
>>> Yeah there's a few funny concerns here that make this awkward:
>>> - For simplicity doing these helpers at the gem level should make things a
>>>     bit easier, because then we have a standard way to drop the reference.
>>>     It does mean that the only thing you can lock like this are gem objects,
>>>     but I think that's fine. At least for a first cut.
>>>
>>> - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
>>>     into adopting gem bo internally to be able to drop a pile of code and
>>>     stop making vmwgfx the only special-case we have b) drivers which don't
>>>     need this won't need this, so should be fine.
>>>
>>>     The other awkward thing I guess is that ttm would need to use the
>>>     embedded kref from the gem bo, but that should be transparent I think.
>>>
>>> - Next up is dma-buf: For i915 we'd like to do the same eviction trick
>>>     also through p2p dma-buf callbacks, so that this works the same as
>>>     eviction/reservation within a gpu. But for these internal bo you might
>>>     not have a dma-buf, so we can't just lift the trick to the dma-buf
>>>     level. But I think if we pass e.g. a struct list_head and a callback to
>>>     unreference/unlock all the buffers in there to the exporter, plus
>>>     similar for the slowpath lock, then that should be doable without
>>>     glorious layering inversions between dma-buf and gem.
>>>
>>>     I think for dma-buf it should even be ok if this requires that we
>>>     allocate an entire structure with kmalloc or something, allocating
>>>     memory while holding dma_resv is ok.
>> Yes, the thing here with the suggested helpers is that you would just
>> embed a trans_lockitem struct in the gem object (and defines the gem
>> object ops). Otherwise and for passing to dma-buf this is pretty much
>> exactly what you are suggesting, but the huge benefit of encapsulating
>> the needed members like this is that when we need to change something we
>> change it in just one place.
>>
>> For anything that doesn't have a gem object (dma-buf, vmwgfx or
>> whatever) you have the choice of either allocating a struct
>> trans_lockitem or embed it wherever you prefer. In particular, this is
>> beneficial where you have a single dma-resv class ww-mutex sitting
>> somewhere in the way and you don't want to needlessly have a gem object
>> that embeds it.
> The thing is, everyone who actually uses dma_resv_lock has a
> gem_buffer_object underneath. So it feels a bit like flexibility for
> no real need, and I think it would make it slightly more awkard for
> gem drivers to neatly integrate into their cs path. The lockitem
> struct works, but it is a bit cumbersome.

Well that's partly because of it's impossible to use a standalone 
ww_mutex in a locking transaction that can only add gem objects to the 
list :/. Already in the i915 driver we have and may want to add various 
places where we have dead gem objects sitting because of this.

Also, more importantly, If we pass a list down the dma-buf 
move_notify(), a trans_lockitem is pretty much exactly what we expect 
back (except of course for the private pointer). It would be odd if we'd 
expect all list items to be gem objects when it's a dma-buf interface?

>
> Also if we add some wrappers to e.g. add a gem_bo to the ctx, then if
> we decide to slip the lockitem in there, we still only need to touch
> the helper code, and not all drivers.

Well, yes assuming we always have an embedding gem object for a dma_resv 
that might be true, but either way I don't really expect the gem helpers 
to look very different. We will need the ops anyway and a specialized 
context so if the only thing we're debating is whether or not to embed a 
struct in the gem object, unless you really insist on using the gem 
object initially, I suggest we try this and if it becomes awkward, just 
s/trans_lockitem/drm_gem_object/

/Thomas


> -Daniel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
@ 2021-04-15 15:40           ` Thomas Hellström (Intel)
  0 siblings, 0 replies; 15+ messages in thread
From: Thomas Hellström (Intel) @ 2021-04-15 15:40 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Thomas Hellström, intel-gfx, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Matthew Auld,
	Christian König


On 4/15/21 4:40 PM, Daniel Vetter wrote:
> On Thu, Apr 15, 2021 at 4:02 PM Thomas Hellström (Intel)
> <thomas_os@shipmail.org> wrote:
>>
>> On 4/15/21 3:37 PM, Daniel Vetter wrote:
>>> On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
>>>> Am 13.04.21 um 09:50 schrieb Thomas Hellström:
>>>>> Hi!
>>>>>
>>>>> During the dma_resv conversion of the i915 driver, we've been using ww
>>>>> transaction lock lists to keep track of ww_mutexes that are locked
>>>>> during the transaction so that they can be batch unlocked at suitable
>>>>> locations. Including also the LMEM/VRAM eviction we've ended up with
>>>>> two static lists per transaction context; one typically unlocked at the
>>>>> end of transaction and one initialized before and unlocked after each
>>>>> buffer object validate. This enables us to do sleeping locking at
>>>>> eviction and keep objects locked on the eviction list until we
>>>>> eventually succeed allocating memory (modulo minor flaws of course).
>>>>>
>>>>> It would be beneficial with the i915 TTM conversion to be able to
>>>>> introduce a similar functionality that would work in ttm but also
>>>>> cross-driver in, for example move_notify. It would also be beneficial
>>>>> to be able to put any dma_resv ww mutex on the lists, and not require
>>>>> it to be embedded in a particular object type.
>>>>>
>>>>> I started scetching on some utilities for this. For TTM, for example,
>>>>> the idea would be to pass a list head for the ww transaction lock list
>>>>> in the ttm_operation_ctx. A function taking a ww_mutex could then
>>>>> either attach a grabbed lock to the list for batch unlocking, or be
>>>>> responsible for unlocking when the lock's scope is exited. It's also
>>>>> possible to create sublists if so desired. I believe the below would be
>>>>> sufficient to cover the i915 functionality.
>>>>>
>>>>> Any comments and suggestions appreciated!
>>>> ah yes Daniel and I haven been discussing something like this for years.
>>>>
>>>> I also came up with rough implementation, but we always ran into lifetime
>>>> issues.
>>>>
>>>> In other words the ww_mutexes which are on the list would need to be kept
>>>> alive until unlocked.
>>>>
>>>> Because of this we kind of backed up and said we would need this on the GEM
>>>> level instead of working with dma_resv objects.
>>> Yeah there's a few funny concerns here that make this awkward:
>>> - For simplicity doing these helpers at the gem level should make things a
>>>     bit easier, because then we have a standard way to drop the reference.
>>>     It does mean that the only thing you can lock like this are gem objects,
>>>     but I think that's fine. At least for a first cut.
>>>
>>> - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
>>>     into adopting gem bo internally to be able to drop a pile of code and
>>>     stop making vmwgfx the only special-case we have b) drivers which don't
>>>     need this won't need this, so should be fine.
>>>
>>>     The other awkward thing I guess is that ttm would need to use the
>>>     embedded kref from the gem bo, but that should be transparent I think.
>>>
>>> - Next up is dma-buf: For i915 we'd like to do the same eviction trick
>>>     also through p2p dma-buf callbacks, so that this works the same as
>>>     eviction/reservation within a gpu. But for these internal bo you might
>>>     not have a dma-buf, so we can't just lift the trick to the dma-buf
>>>     level. But I think if we pass e.g. a struct list_head and a callback to
>>>     unreference/unlock all the buffers in there to the exporter, plus
>>>     similar for the slowpath lock, then that should be doable without
>>>     glorious layering inversions between dma-buf and gem.
>>>
>>>     I think for dma-buf it should even be ok if this requires that we
>>>     allocate an entire structure with kmalloc or something, allocating
>>>     memory while holding dma_resv is ok.
>> Yes, the thing here with the suggested helpers is that you would just
>> embed a trans_lockitem struct in the gem object (and defines the gem
>> object ops). Otherwise and for passing to dma-buf this is pretty much
>> exactly what you are suggesting, but the huge benefit of encapsulating
>> the needed members like this is that when we need to change something we
>> change it in just one place.
>>
>> For anything that doesn't have a gem object (dma-buf, vmwgfx or
>> whatever) you have the choice of either allocating a struct
>> trans_lockitem or embed it wherever you prefer. In particular, this is
>> beneficial where you have a single dma-resv class ww-mutex sitting
>> somewhere in the way and you don't want to needlessly have a gem object
>> that embeds it.
> The thing is, everyone who actually uses dma_resv_lock has a
> gem_buffer_object underneath. So it feels a bit like flexibility for
> no real need, and I think it would make it slightly more awkard for
> gem drivers to neatly integrate into their cs path. The lockitem
> struct works, but it is a bit cumbersome.

Well that's partly because of it's impossible to use a standalone 
ww_mutex in a locking transaction that can only add gem objects to the 
list :/. Already in the i915 driver we have and may want to add various 
places where we have dead gem objects sitting because of this.

Also, more importantly, If we pass a list down the dma-buf 
move_notify(), a trans_lockitem is pretty much exactly what we expect 
back (except of course for the private pointer). It would be odd if we'd 
expect all list items to be gem objects when it's a dma-buf interface?

>
> Also if we add some wrappers to e.g. add a gem_bo to the ctx, then if
> we decide to slip the lockitem in there, we still only need to touch
> the helper code, and not all drivers.

Well, yes assuming we always have an embedding gem object for a dma_resv 
that might be true, but either way I don't really expect the gem helpers 
to look very different. We will need the ops anyway and a specialized 
context so if the only thing we're debating is whether or not to embed a 
struct in the gem object, unless you really insist on using the gem 
object initially, I suggest we try this and if it becomes awkward, just 
s/trans_lockitem/drm_gem_object/

/Thomas


> -Daniel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
  2021-04-15 15:40           ` [Intel-gfx] " Thomas Hellström (Intel)
@ 2021-04-15 15:51             ` Daniel Vetter
  -1 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2021-04-15 15:51 UTC (permalink / raw)
  To: Thomas Hellström (Intel)
  Cc: Thomas Hellström, intel-gfx, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Matthew Auld,
	Christian König

On Thu, Apr 15, 2021 at 05:40:24PM +0200, Thomas Hellström (Intel) wrote:
> 
> On 4/15/21 4:40 PM, Daniel Vetter wrote:
> > On Thu, Apr 15, 2021 at 4:02 PM Thomas Hellström (Intel)
> > <thomas_os@shipmail.org> wrote:
> > > 
> > > On 4/15/21 3:37 PM, Daniel Vetter wrote:
> > > > On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
> > > > > Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> > > > > > Hi!
> > > > > > 
> > > > > > During the dma_resv conversion of the i915 driver, we've been using ww
> > > > > > transaction lock lists to keep track of ww_mutexes that are locked
> > > > > > during the transaction so that they can be batch unlocked at suitable
> > > > > > locations. Including also the LMEM/VRAM eviction we've ended up with
> > > > > > two static lists per transaction context; one typically unlocked at the
> > > > > > end of transaction and one initialized before and unlocked after each
> > > > > > buffer object validate. This enables us to do sleeping locking at
> > > > > > eviction and keep objects locked on the eviction list until we
> > > > > > eventually succeed allocating memory (modulo minor flaws of course).
> > > > > > 
> > > > > > It would be beneficial with the i915 TTM conversion to be able to
> > > > > > introduce a similar functionality that would work in ttm but also
> > > > > > cross-driver in, for example move_notify. It would also be beneficial
> > > > > > to be able to put any dma_resv ww mutex on the lists, and not require
> > > > > > it to be embedded in a particular object type.
> > > > > > 
> > > > > > I started scetching on some utilities for this. For TTM, for example,
> > > > > > the idea would be to pass a list head for the ww transaction lock list
> > > > > > in the ttm_operation_ctx. A function taking a ww_mutex could then
> > > > > > either attach a grabbed lock to the list for batch unlocking, or be
> > > > > > responsible for unlocking when the lock's scope is exited. It's also
> > > > > > possible to create sublists if so desired. I believe the below would be
> > > > > > sufficient to cover the i915 functionality.
> > > > > > 
> > > > > > Any comments and suggestions appreciated!
> > > > > ah yes Daniel and I haven been discussing something like this for years.
> > > > > 
> > > > > I also came up with rough implementation, but we always ran into lifetime
> > > > > issues.
> > > > > 
> > > > > In other words the ww_mutexes which are on the list would need to be kept
> > > > > alive until unlocked.
> > > > > 
> > > > > Because of this we kind of backed up and said we would need this on the GEM
> > > > > level instead of working with dma_resv objects.
> > > > Yeah there's a few funny concerns here that make this awkward:
> > > > - For simplicity doing these helpers at the gem level should make things a
> > > >     bit easier, because then we have a standard way to drop the reference.
> > > >     It does mean that the only thing you can lock like this are gem objects,
> > > >     but I think that's fine. At least for a first cut.
> > > > 
> > > > - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
> > > >     into adopting gem bo internally to be able to drop a pile of code and
> > > >     stop making vmwgfx the only special-case we have b) drivers which don't
> > > >     need this won't need this, so should be fine.
> > > > 
> > > >     The other awkward thing I guess is that ttm would need to use the
> > > >     embedded kref from the gem bo, but that should be transparent I think.
> > > > 
> > > > - Next up is dma-buf: For i915 we'd like to do the same eviction trick
> > > >     also through p2p dma-buf callbacks, so that this works the same as
> > > >     eviction/reservation within a gpu. But for these internal bo you might
> > > >     not have a dma-buf, so we can't just lift the trick to the dma-buf
> > > >     level. But I think if we pass e.g. a struct list_head and a callback to
> > > >     unreference/unlock all the buffers in there to the exporter, plus
> > > >     similar for the slowpath lock, then that should be doable without
> > > >     glorious layering inversions between dma-buf and gem.
> > > > 
> > > >     I think for dma-buf it should even be ok if this requires that we
> > > >     allocate an entire structure with kmalloc or something, allocating
> > > >     memory while holding dma_resv is ok.
> > > Yes, the thing here with the suggested helpers is that you would just
> > > embed a trans_lockitem struct in the gem object (and defines the gem
> > > object ops). Otherwise and for passing to dma-buf this is pretty much
> > > exactly what you are suggesting, but the huge benefit of encapsulating
> > > the needed members like this is that when we need to change something we
> > > change it in just one place.
> > > 
> > > For anything that doesn't have a gem object (dma-buf, vmwgfx or
> > > whatever) you have the choice of either allocating a struct
> > > trans_lockitem or embed it wherever you prefer. In particular, this is
> > > beneficial where you have a single dma-resv class ww-mutex sitting
> > > somewhere in the way and you don't want to needlessly have a gem object
> > > that embeds it.
> > The thing is, everyone who actually uses dma_resv_lock has a
> > gem_buffer_object underneath. So it feels a bit like flexibility for
> > no real need, and I think it would make it slightly more awkard for
> > gem drivers to neatly integrate into their cs path. The lockitem
> > struct works, but it is a bit cumbersome.
> 
> Well that's partly because of it's impossible to use a standalone ww_mutex
> in a locking transaction that can only add gem objects to the list :/.
> Already in the i915 driver we have and may want to add various places where
> we have dead gem objects sitting because of this.
> 
> Also, more importantly, If we pass a list down the dma-buf move_notify(), a
> trans_lockitem is pretty much exactly what we expect back (except of course
> for the private pointer). It would be odd if we'd expect all list items to
> be gem objects when it's a dma-buf interface?
> 
> > 
> > Also if we add some wrappers to e.g. add a gem_bo to the ctx, then if
> > we decide to slip the lockitem in there, we still only need to touch
> > the helper code, and not all drivers.
> 
> Well, yes assuming we always have an embedding gem object for a dma_resv
> that might be true, but either way I don't really expect the gem helpers to
> look very different. We will need the ops anyway and a specialized context
> so if the only thing we're debating is whether or not to embed a struct in
> the gem object, unless you really insist on using the gem object initially,
> I suggest we try this and if it becomes awkward, just
> s/trans_lockitem/drm_gem_object/

Hm yeah I think you convinced me this is a bit a bikeshed :-) Maybe call
it dma_resv_lockitem or so if we go with top-level generic solution. And
then embed it wheterever we feel like.

The annoying thing with tha generic thing is that I'd like to avoid the
full callback pointer in all the gem objects, but maybe another ptr really
doesn't matter on this if we add a linked list anyway ...
-Daniel

> 
> /Thomas
> 
> 
> > -Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [Intel-gfx] [Linaro-mm-sig] [RFC] Cross-driver ww transaction lock lists
@ 2021-04-15 15:51             ` Daniel Vetter
  0 siblings, 0 replies; 15+ messages in thread
From: Daniel Vetter @ 2021-04-15 15:51 UTC (permalink / raw)
  To: Thomas Hellström (Intel)
  Cc: Thomas Hellström, intel-gfx, dri-devel,
	moderated list:DMA BUFFER SHARING FRAMEWORK, Matthew Auld,
	Christian König

On Thu, Apr 15, 2021 at 05:40:24PM +0200, Thomas Hellström (Intel) wrote:
> 
> On 4/15/21 4:40 PM, Daniel Vetter wrote:
> > On Thu, Apr 15, 2021 at 4:02 PM Thomas Hellström (Intel)
> > <thomas_os@shipmail.org> wrote:
> > > 
> > > On 4/15/21 3:37 PM, Daniel Vetter wrote:
> > > > On Tue, Apr 13, 2021 at 09:57:06AM +0200, Christian König wrote:
> > > > > Am 13.04.21 um 09:50 schrieb Thomas Hellström:
> > > > > > Hi!
> > > > > > 
> > > > > > During the dma_resv conversion of the i915 driver, we've been using ww
> > > > > > transaction lock lists to keep track of ww_mutexes that are locked
> > > > > > during the transaction so that they can be batch unlocked at suitable
> > > > > > locations. Including also the LMEM/VRAM eviction we've ended up with
> > > > > > two static lists per transaction context; one typically unlocked at the
> > > > > > end of transaction and one initialized before and unlocked after each
> > > > > > buffer object validate. This enables us to do sleeping locking at
> > > > > > eviction and keep objects locked on the eviction list until we
> > > > > > eventually succeed allocating memory (modulo minor flaws of course).
> > > > > > 
> > > > > > It would be beneficial with the i915 TTM conversion to be able to
> > > > > > introduce a similar functionality that would work in ttm but also
> > > > > > cross-driver in, for example move_notify. It would also be beneficial
> > > > > > to be able to put any dma_resv ww mutex on the lists, and not require
> > > > > > it to be embedded in a particular object type.
> > > > > > 
> > > > > > I started scetching on some utilities for this. For TTM, for example,
> > > > > > the idea would be to pass a list head for the ww transaction lock list
> > > > > > in the ttm_operation_ctx. A function taking a ww_mutex could then
> > > > > > either attach a grabbed lock to the list for batch unlocking, or be
> > > > > > responsible for unlocking when the lock's scope is exited. It's also
> > > > > > possible to create sublists if so desired. I believe the below would be
> > > > > > sufficient to cover the i915 functionality.
> > > > > > 
> > > > > > Any comments and suggestions appreciated!
> > > > > ah yes Daniel and I haven been discussing something like this for years.
> > > > > 
> > > > > I also came up with rough implementation, but we always ran into lifetime
> > > > > issues.
> > > > > 
> > > > > In other words the ww_mutexes which are on the list would need to be kept
> > > > > alive until unlocked.
> > > > > 
> > > > > Because of this we kind of backed up and said we would need this on the GEM
> > > > > level instead of working with dma_resv objects.
> > > > Yeah there's a few funny concerns here that make this awkward:
> > > > - For simplicity doing these helpers at the gem level should make things a
> > > >     bit easier, because then we have a standard way to drop the reference.
> > > >     It does mean that the only thing you can lock like this are gem objects,
> > > >     but I think that's fine. At least for a first cut.
> > > > 
> > > > - This is a bit awkward for vmwgfx, but a) Zack has mentioned he's looking
> > > >     into adopting gem bo internally to be able to drop a pile of code and
> > > >     stop making vmwgfx the only special-case we have b) drivers which don't
> > > >     need this won't need this, so should be fine.
> > > > 
> > > >     The other awkward thing I guess is that ttm would need to use the
> > > >     embedded kref from the gem bo, but that should be transparent I think.
> > > > 
> > > > - Next up is dma-buf: For i915 we'd like to do the same eviction trick
> > > >     also through p2p dma-buf callbacks, so that this works the same as
> > > >     eviction/reservation within a gpu. But for these internal bo you might
> > > >     not have a dma-buf, so we can't just lift the trick to the dma-buf
> > > >     level. But I think if we pass e.g. a struct list_head and a callback to
> > > >     unreference/unlock all the buffers in there to the exporter, plus
> > > >     similar for the slowpath lock, then that should be doable without
> > > >     glorious layering inversions between dma-buf and gem.
> > > > 
> > > >     I think for dma-buf it should even be ok if this requires that we
> > > >     allocate an entire structure with kmalloc or something, allocating
> > > >     memory while holding dma_resv is ok.
> > > Yes, the thing here with the suggested helpers is that you would just
> > > embed a trans_lockitem struct in the gem object (and defines the gem
> > > object ops). Otherwise and for passing to dma-buf this is pretty much
> > > exactly what you are suggesting, but the huge benefit of encapsulating
> > > the needed members like this is that when we need to change something we
> > > change it in just one place.
> > > 
> > > For anything that doesn't have a gem object (dma-buf, vmwgfx or
> > > whatever) you have the choice of either allocating a struct
> > > trans_lockitem or embed it wherever you prefer. In particular, this is
> > > beneficial where you have a single dma-resv class ww-mutex sitting
> > > somewhere in the way and you don't want to needlessly have a gem object
> > > that embeds it.
> > The thing is, everyone who actually uses dma_resv_lock has a
> > gem_buffer_object underneath. So it feels a bit like flexibility for
> > no real need, and I think it would make it slightly more awkard for
> > gem drivers to neatly integrate into their cs path. The lockitem
> > struct works, but it is a bit cumbersome.
> 
> Well that's partly because of it's impossible to use a standalone ww_mutex
> in a locking transaction that can only add gem objects to the list :/.
> Already in the i915 driver we have and may want to add various places where
> we have dead gem objects sitting because of this.
> 
> Also, more importantly, If we pass a list down the dma-buf move_notify(), a
> trans_lockitem is pretty much exactly what we expect back (except of course
> for the private pointer). It would be odd if we'd expect all list items to
> be gem objects when it's a dma-buf interface?
> 
> > 
> > Also if we add some wrappers to e.g. add a gem_bo to the ctx, then if
> > we decide to slip the lockitem in there, we still only need to touch
> > the helper code, and not all drivers.
> 
> Well, yes assuming we always have an embedding gem object for a dma_resv
> that might be true, but either way I don't really expect the gem helpers to
> look very different. We will need the ops anyway and a specialized context
> so if the only thing we're debating is whether or not to embed a struct in
> the gem object, unless you really insist on using the gem object initially,
> I suggest we try this and if it becomes awkward, just
> s/trans_lockitem/drm_gem_object/

Hm yeah I think you convinced me this is a bit a bikeshed :-) Maybe call
it dma_resv_lockitem or so if we go with top-level generic solution. And
then embed it wheterever we feel like.

The annoying thing with tha generic thing is that I'd like to avoid the
full callback pointer in all the gem objects, but maybe another ptr really
doesn't matter on this if we add a linked list anyway ...
-Daniel

> 
> /Thomas
> 
> 
> > -Daniel

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-04-15 15:51 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-13  7:50 [RFC] Cross-driver ww transaction lock lists Thomas Hellström
2021-04-13  7:50 ` [Intel-gfx] " Thomas Hellström
2021-04-13  7:57 ` Christian König
2021-04-13  7:57   ` [Intel-gfx] " Christian König
2021-04-13  8:07   ` Thomas Hellström (Intel)
2021-04-15 13:37   ` Daniel Vetter
2021-04-15 13:37     ` [Intel-gfx] " Daniel Vetter
2021-04-15 14:02     ` [Linaro-mm-sig] " Thomas Hellström (Intel)
2021-04-15 14:02       ` [Intel-gfx] " Thomas Hellström (Intel)
2021-04-15 14:40       ` Daniel Vetter
2021-04-15 14:40         ` [Intel-gfx] " Daniel Vetter
2021-04-15 15:40         ` Thomas Hellström (Intel)
2021-04-15 15:40           ` [Intel-gfx] " Thomas Hellström (Intel)
2021-04-15 15:51           ` Daniel Vetter
2021-04-15 15:51             ` [Intel-gfx] " Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.