All of lore.kernel.org
 help / color / mirror / Atom feed
* thoughts of looking at android fences
@ 2013-10-02  7:35 Maarten Lankhorst
  2013-10-02 18:13 ` [Linaro-mm-sig] " Erik Gilling
  0 siblings, 1 reply; 18+ messages in thread
From: Maarten Lankhorst @ 2013-10-02  7:35 UTC (permalink / raw)
  To: dri-devel, linaro-mm-sig

Hey,

So I took a look at the sync stuff in android, in a lot of ways I believe that they're similar, yet subtly different.
Most of the stuff I looked at is from the sync.h header in drivers/staging, so maybe my knowledge is incomplete.

The timeline is similar to what I called a fence context. Each command stream on a gpu can have a context. Because
nvidia hardware can have 4095 separate timelines, I didn't want to keep the bookkeeping for each timeline, although
I guess that it's already done. Maybe it could be done in a unified way for each driver, making a transition to
timelines that can be used by android easier.

I did not have an explicit syncpoint addition, but I think that sync points + sync_fence were similar to what I did with
my dma-fence stuff, except slightly different.
In my approach the dma-fence is signaled after all sync_points are done AND the queued commands are executed.
In effect the dma-fence becomes the next syncpoint, depending on all previous dma-fence syncpoints.

An important thing to note is that dma-fence is kernelspace only, so it might be better to rename it to syncpoint,
and use fence for the userspace interface.

A big difference is locking, I assume in my code that most fences emitted are not waited on, so the fastpath
fence_signal is a test_and_set_bit plus test_bit. A single lock is used for the waitqueue and callbacks,
with the waitqueue being implemented internally as an asynchronous callback. The lock is provided by the driver,
which makes adding support for old hardware that has no reliable way of notifying completion of events easier.

I avoided using global locks, but I think for debugfs support I may end up having to add some.

The dma fence looks similar overall, except that I allow overriding some stuff and keep less track about state.
I do believe that I can create a userspace interface around dma_fence that works similar to android, and the
kernel space interface could be done in a similar way too.

One thing though: is it really required to merge fences? It seems to me that if I add a poll callback userspace
could simply do a poll on a list of fences. This would give userspace all the information it needs about each
individual fence.

The thing about wait/wound mutexes can be ignored for this discussion. It's really just a method of adding a
fence to a dma-buf, and building a list of all dma-fences to wait on in the kernel before starting a command
buffer, and setting a new fence to all the dma-bufs to signal completion of the event. Regardless of the sync
mechanism we'll decide on, this stuff wouldn't change.

Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
syncpoints to dma-fence, which I'll probably rename to syncpoints.

~Maarten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-02  7:35 thoughts of looking at android fences Maarten Lankhorst
@ 2013-10-02 18:13 ` Erik Gilling
  2013-10-08 17:37   ` John Stultz
  2013-10-08 18:47   ` Rob Clark
  0 siblings, 2 replies; 18+ messages in thread
From: Erik Gilling @ 2013-10-02 18:13 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: linaro-mm-sig, dri-devel

On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
<maarten.lankhorst@canonical.com> wrote:
> The timeline is similar to what I called a fence context. Each command stream on a gpu can have a context. Because
> nvidia hardware can have 4095 separate timelines, I didn't want to keep the bookkeeping for each timeline, although
> I guess that it's already done. Maybe it could be done in a unified way for each driver, making a transition to
> timelines that can be used by android easier.
>
> I did not have an explicit syncpoint addition, but I think that sync points + sync_fence were similar to what I did with
> my dma-fence stuff, except slightly different.
> In my approach the dma-fence is signaled after all sync_points are done AND the queued commands are executed.
> In effect the dma-fence becomes the next syncpoint, depending on all previous dma-fence syncpoints.

What makes queued command completion different than any other sync point?

> An important thing to note is that dma-fence is kernelspace only, so it might be better to rename it to syncpoint,
> and use fence for the userspace interface.
>
> A big difference is locking, I assume in my code that most fences emitted are not waited on, so the fastpath
> fence_signal is a test_and_set_bit plus test_bit. A single lock is used for the waitqueue and callbacks,
> with the waitqueue being implemented internally as an asynchronous callback.

I assume very little lock contention so the performance impact is
negligible.  Also, because sync_pts on a timeline are strictly
ordered, it's necessary to check all active pts on a timeline signal.
A future optimization could involve keeping active pts in a sorted
list or other data structure so that you only need to iterate over the
pts that are about to signal.  So far we've not seen any bottlenecks
here so I've kept it simple.

> The lock is provided by the driver, which makes adding support for old hardware that has no reliable way of notifying completion of events easier.

I'm a bit confused here how it's possible to implement sync on
hardware with "no reliable way of notifying completion of events."  That
seems like a non-starter to me.

> I avoided using global locks, but I think for debugfs support I may end up having to add some.

As did I, except for debugfs support.

> One thing though: is it really required to merge fences? It seems to me that if I add a poll callback userspace
> could simply do a poll on a list of fences. This would give userspace all the information it needs about each
> individual fence.

This is very important.  It greatly simplifies they way the userspace
deals with fences.  It means that it only has to track one fd per
buffer and both the kernel API and userspace RPC apis don't have to
take a variable number of fds per buffer.  FWIW the android sync
driver already implements poll.

> Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
> syncpoints to dma-fence, which I'll probably rename to syncpoints.

I thought the plan decided at plumbers was to investigate backing
dma_buf with the android sync solution not the other way around.  It
doesn't make sense to me to take a working, tested, end-to-end
solution with a released compositing system built around it, throw it
out, and replace it with new un-tested code to
support a system which is not yet built.

Cheers,
   Erik

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-02 18:13 ` [Linaro-mm-sig] " Erik Gilling
@ 2013-10-08 17:37   ` John Stultz
  2013-10-08 18:56     ` Rob Clark
  2013-10-09 14:39     ` Maarten Lankhorst
  2013-10-08 18:47   ` Rob Clark
  1 sibling, 2 replies; 18+ messages in thread
From: John Stultz @ 2013-10-08 17:37 UTC (permalink / raw)
  To: Erik Gilling; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

On Wed, Oct 2, 2013 at 11:13 AM, Erik Gilling <konkers@android.com> wrote:
> On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
> <maarten.lankhorst@canonical.com> wrote:
>> Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
>> syncpoints to dma-fence, which I'll probably rename to syncpoints.
>
> I thought the plan decided at plumbers was to investigate backing
> dma_buf with the android sync solution not the other way around.  It
> doesn't make sense to me to take a working, tested, end-to-end
> solution with a released compositing system built around it, throw it
> out, and replace it with new un-tested code to
> support a system which is not yet built.

Hey Erik,
  Thanks for the clarifying points in your email, your insights and
feedback are critical, and I think having you and Maarten continue to
work out the details here will make this productive.

My recollection from the discussion was that Rob was ok with trying to
pipe the sync arguments through the various interfaces in order to
support the explicit sync, but I think he did suggest having it backed
by the dma-buf fences underneath.

I know this can be frustrating to watch things be reimplemented when
you have a pre-baked solution, but some compromise will be needed to
get things merged (and Maarten is taking the initiative here), but its
important to keep discussing this so the *right* compromises are made
that don't hurt performance, etc.

My hope is Maarten's approach of getting the dma-fence core
integrated, and then moving the existing Android sync interface over
to the shared back-end, will allow for proper apples-to-apples
comparisons of the same interface. And if the functionality isn't
sufficient we can hold off on merging the sync interface conversion
until that gets resolved.

Does that sound reasonable?

thanks
-john

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-02 18:13 ` [Linaro-mm-sig] " Erik Gilling
  2013-10-08 17:37   ` John Stultz
@ 2013-10-08 18:47   ` Rob Clark
  1 sibling, 0 replies; 18+ messages in thread
From: Rob Clark @ 2013-10-08 18:47 UTC (permalink / raw)
  To: Erik Gilling; +Cc: linaro-mm-sig, dri-devel

On Wed, Oct 2, 2013 at 2:13 PM, Erik Gilling <konkers@android.com> wrote:
>> The lock is provided by the driver, which makes adding support for old hardware that has no reliable way of notifying completion of events easier.
>
> I'm a bit confused here how it's possible to implement sync on
> hardware with "no reliable way of notifying completion of events."  That
> seems like a non-starter to me.

I suspect Maarten meant "no reliable way of notifying completion *to
the cpu*".. which isn't strictly needed for gpu<->gpu sharing.

BR,
-R

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-08 17:37   ` John Stultz
@ 2013-10-08 18:56     ` Rob Clark
  2013-10-09 14:39     ` Maarten Lankhorst
  1 sibling, 0 replies; 18+ messages in thread
From: Rob Clark @ 2013-10-08 18:56 UTC (permalink / raw)
  To: John Stultz; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

On Tue, Oct 8, 2013 at 1:37 PM, John Stultz <john.stultz@linaro.org> wrote:
> On Wed, Oct 2, 2013 at 11:13 AM, Erik Gilling <konkers@android.com> wrote:
>> On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
>> <maarten.lankhorst@canonical.com> wrote:
>>> Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
>>> syncpoints to dma-fence, which I'll probably rename to syncpoints.
>>
>> I thought the plan decided at plumbers was to investigate backing
>> dma_buf with the android sync solution not the other way around.  It
>> doesn't make sense to me to take a working, tested, end-to-end
>> solution with a released compositing system built around it, throw it
>> out, and replace it with new un-tested code to
>> support a system which is not yet built.
>
> Hey Erik,
>   Thanks for the clarifying points in your email, your insights and
> feedback are critical, and I think having you and Maarten continue to
> work out the details here will make this productive.
>
> My recollection from the discussion was that Rob was ok with trying to
> pipe the sync arguments through the various interfaces in order to
> support the explicit sync, but I think he did suggest having it backed
> by the dma-buf fences underneath.

Yeah, my comment was mainly about userspace API for different driver
subsystems.  I'd rather add some extra parameter(s?) to drm and v4l
ioctls, even if they are unused by linux userspace, vs having
different ABI for android kernel vs linux kernel.

We probably do however need the zero value to indicate unusued.. at
least for adding new parameters to existing drm ioctls since
drm_ioctl() will be zero'ing stuff out to deal w/ new userspace / old
kernel, or old userspace / new kernel combos.  For new ioctls (like
'atomic') we don't have this constraint.

BR,
-R

> I know this can be frustrating to watch things be reimplemented when
> you have a pre-baked solution, but some compromise will be needed to
> get things merged (and Maarten is taking the initiative here), but its
> important to keep discussing this so the *right* compromises are made
> that don't hurt performance, etc.
>
> My hope is Maarten's approach of getting the dma-fence core
> integrated, and then moving the existing Android sync interface over
> to the shared back-end, will allow for proper apples-to-apples
> comparisons of the same interface. And if the functionality isn't
> sufficient we can hold off on merging the sync interface conversion
> until that gets resolved.
>
> Does that sound reasonable?
>
> thanks
> -john
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-08 17:37   ` John Stultz
  2013-10-08 18:56     ` Rob Clark
@ 2013-10-09 14:39     ` Maarten Lankhorst
  2013-10-24 12:13       ` Maarten Lankhorst
  1 sibling, 1 reply; 18+ messages in thread
From: Maarten Lankhorst @ 2013-10-09 14:39 UTC (permalink / raw)
  To: John Stultz, Erik Gilling; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

Hey,

 op 08-10-13 19:37, John Stultz schreef:
> On Wed, Oct 2, 2013 at 11:13 AM, Erik Gilling <konkers@android.com> wrote:
>> On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
>> <maarten.lankhorst@canonical.com> wrote:
>>> Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
>>> syncpoints to dma-fence, which I'll probably rename to syncpoints.
>> I thought the plan decided at plumbers was to investigate backing
>> dma_buf with the android sync solution not the other way around.  It
>> doesn't make sense to me to take a working, tested, end-to-end
>> solution with a released compositing system built around it, throw it
>> out, and replace it with new un-tested code to
>> support a system which is not yet built.
> Hey Erik,
>   Thanks for the clarifying points in your email, your insights and
> feedback are critical, and I think having you and Maarten continue to
> work out the details here will make this productive.
>
> My recollection from the discussion was that Rob was ok with trying to
> pipe the sync arguments through the various interfaces in order to
> support the explicit sync, but I think he did suggest having it backed
> by the dma-buf fences underneath.
>
> I know this can be frustrating to watch things be reimplemented when
> you have a pre-baked solution, but some compromise will be needed to
> get things merged (and Maarten is taking the initiative here), but its
> important to keep discussing this so the *right* compromises are made
> that don't hurt performance, etc.
>
> My hope is Maarten's approach of getting the dma-fence core
> integrated, and then moving the existing Android sync interface over
> to the shared back-end, will allow for proper apples-to-apples
> comparisons of the same interface. And if the functionality isn't
> sufficient we can hold off on merging the sync interface conversion
> until that gets resolved.
>
Yeah, I'm trying to understand the android side too. I think a unified interface would benefit both. I'm
toying a bit with the sw_sync driver in staging because it's the easiest to try out on my desktop.

The timeline stuff looks like it could be simplified. The main difference that there seems to be is that
I didn't want to create a separate timeline struct for synchronization but let the drivers handle it.

If you use rcu for reference lifetime management of timeline, the kref can be dropped. Signalling all
syncpts on timeline destroy to a new destroyed state would kill the need for a destroyed member.
The active list is unneeded and can be killed if only a linear progression of child_list is allowed.

Which probably leaves this nice structure:

struct sync_timeline {
    const struct sync_timeline_ops    *ops;
    char            name[32];

    struct list_head    child_list_head;
    spinlock_t        child_list_lock;

    struct list_head    sync_timeline_list;
};

Where name, and sync_timeline_list are nice for debugging, but I guess not necessarily required. so that
could be split out into a separate debugfs thing if required. I've moved the pointer to ops to the fence
for dma-fence, which leaves this..

struct sync_timeline {
    struct list_head    child_list_head;
    spinlock_t        child_list_lock;

    struct  sync_timeline_debug {
        struct list_head    sync_timeline_list;
        char name[32];
    };
};

Hm, this looks familiar, the drm drivers had some structure for protecting the active fence list that has
an identical definition, but with a slightly different list offset..

struct __wait_queue_head {
    spinlock_t lock;
    struct list_head task_list;
};

typedef struct __wait_queue_head wait_queue_head_t;

This is nicer to convert the existing drm drivers, which already implement synchronous wait with a waitqueue.
The default wait op is in fact

Ok enough of this little excercise. I just wanted to see how different the 2 are. I think even if the
fence interface will end up being incompatible it wouldn't be too hard to convert android..

Main difference is the ops, android has a lot more than what I used for dma-fence:

struct fence_ops {
	bool (*enable_signaling)(struct fence *fence); // required, callback called with fence->lock held,
	// fence->lock is a pointer passed to __fence_init. Callback should make sure that the fence will
	// be signaled asap.
	bool (*signaled)(struct fence *fence); // optional, but if set to NULL fence_is_signaled is not
	// required to ever return true, unless enable_signaling is called, similar to has_signaled
	long (*wait)(struct fence *fence, bool intr, signed long timeout); // required, but it can be set
	// to the default fence_default_wait implementation which is recommended. It calls enable_signaling
	// and appends itself to async callback list. Identical semantics to wait_event_interruptible_timeout.
	void (*release)(struct fence *fence); // free_pt
};

Because every fence has a stamp, there is no need for a compare op.

struct sync_timeline_ops {
	const char *driver_name;

	/* required */
	struct sync_pt *(*dup)(struct sync_pt *pt);

	/* required */
	int (*has_signaled)(struct sync_pt *pt);

	/* required */
	int (*compare)(struct sync_pt *a, struct sync_pt *b);

	/* optional */
	void (*free_pt)(struct sync_pt *sync_pt);

	/* optional */
	void (*release_obj)(struct sync_timeline *sync_timeline);

	/* deprecated */
	void (*print_obj)(struct seq_file *s,
			  struct sync_timeline *sync_timeline);

	/* deprecated */
	void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);

	/* optional */
	int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);

	/* optional */
	void (*timeline_value_str)(struct sync_timeline *timeline, char *str,
				   int size);

	/* optional */
	void (*pt_value_str)(struct sync_pt *pt, char *str, int size);
};

The dup is weird, I have nothing like that. I do allow multiple callbacks to be added to the same
dma-fence, and allow callbacks to be aborted, provided you still hold a refcount.

So from the ops it looks like what's mostly missing is print_pt, fill_driver_data,
timeline_value_str and pt_value_str.

I have no idea how much of this is inaccurate. This is just an assessment based on me looking at
the stuff in drivers/staging/android/sync for an afternoon and the earlier discussions. :)

~Maarten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-09 14:39     ` Maarten Lankhorst
@ 2013-10-24 12:13       ` Maarten Lankhorst
  2013-10-30 12:17         ` Maarten Lankhorst
  0 siblings, 1 reply; 18+ messages in thread
From: Maarten Lankhorst @ 2013-10-24 12:13 UTC (permalink / raw)
  To: John Stultz, Erik Gilling; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

op 09-10-13 16:39, Maarten Lankhorst schreef:
> Hey,
>
>  op 08-10-13 19:37, John Stultz schreef:
>> On Wed, Oct 2, 2013 at 11:13 AM, Erik Gilling <konkers@android.com> wrote:
>>> On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
>>> <maarten.lankhorst@canonical.com> wrote:
>>>> Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
>>>> syncpoints to dma-fence, which I'll probably rename to syncpoints.
>>> I thought the plan decided at plumbers was to investigate backing
>>> dma_buf with the android sync solution not the other way around.  It
>>> doesn't make sense to me to take a working, tested, end-to-end
>>> solution with a released compositing system built around it, throw it
>>> out, and replace it with new un-tested code to
>>> support a system which is not yet built.
>> Hey Erik,
>>   Thanks for the clarifying points in your email, your insights and
>> feedback are critical, and I think having you and Maarten continue to
>> work out the details here will make this productive.
>>
>> My recollection from the discussion was that Rob was ok with trying to
>> pipe the sync arguments through the various interfaces in order to
>> support the explicit sync, but I think he did suggest having it backed
>> by the dma-buf fences underneath.
>>
>> I know this can be frustrating to watch things be reimplemented when
>> you have a pre-baked solution, but some compromise will be needed to
>> get things merged (and Maarten is taking the initiative here), but its
>> important to keep discussing this so the *right* compromises are made
>> that don't hurt performance, etc.
>>
>> My hope is Maarten's approach of getting the dma-fence core
>> integrated, and then moving the existing Android sync interface over
>> to the shared back-end, will allow for proper apples-to-apples
>> comparisons of the same interface. And if the functionality isn't
>> sufficient we can hold off on merging the sync interface conversion
>> until that gets resolved.
>>
> Yeah, I'm trying to understand the android side too. I think a unified interface would benefit both. I'm
> toying a bit with the sw_sync driver in staging because it's the easiest to try out on my desktop.
>
> The timeline stuff looks like it could be simplified. The main difference that there seems to be is that
> I didn't want to create a separate timeline struct for synchronization but let the drivers handle it.
>
> If you use rcu for reference lifetime management of timeline, the kref can be dropped. Signalling all
> syncpts on timeline destroy to a new destroyed state would kill the need for a destroyed member.
> The active list is unneeded and can be killed if only a linear progression of child_list is allowed.
>
> Which probably leaves this nice structure:
> struct sync_timeline {
>     const struct sync_timeline_ops    *ops;
>     char            name[32];
>
>     struct list_head    child_list_head;
>     spinlock_t        child_list_lock;
>
>     struct list_head    sync_timeline_list;
> };
>
> Where name, and sync_timeline_list are nice for debugging, but I guess not necessarily required. so that
> could be split out into a separate debugfs thing if required. I've moved the pointer to ops to the fence
> for dma-fence, which leaves this..
>
> struct sync_timeline {
>     struct list_head    child_list_head;
>     spinlock_t        child_list_lock;
>
>     struct  sync_timeline_debug {
>         struct list_head    sync_timeline_list;
>         char name[32];
>     };
> };
>
> Hm, this looks familiar, the drm drivers had some structure for protecting the active fence list that has
> an identical definition, but with a slightly different list offset..
>
> struct __wait_queue_head {
>     spinlock_t lock;
>     struct list_head task_list;
> };
>
> typedef struct __wait_queue_head wait_queue_head_t;
>
> This is nicer to convert the existing drm drivers, which already implement synchronous wait with a waitqueue.
> The default wait op is in fact
>
> Ok enough of this little excercise. I just wanted to see how different the 2 are. I think even if the
> fence interface will end up being incompatible it wouldn't be too hard to convert android..
>
> Main difference is the ops, android has a lot more than what I used for dma-fence:
>
> struct fence_ops {
> 	bool (*enable_signaling)(struct fence *fence); // required, callback called with fence->lock held,
> 	// fence->lock is a pointer passed to __fence_init. Callback should make sure that the fence will
> 	// be signaled asap.
> 	bool (*signaled)(struct fence *fence); // optional, but if set to NULL fence_is_signaled is not
> 	// required to ever return true, unless enable_signaling is called, similar to has_signaled
> 	long (*wait)(struct fence *fence, bool intr, signed long timeout); // required, but it can be set
> 	// to the default fence_default_wait implementation which is recommended. It calls enable_signaling
> 	// and appends itself to async callback list. Identical semantics to wait_event_interruptible_timeout.
> 	void (*release)(struct fence *fence); // free_pt
> };
>
> Because every fence has a stamp, there is no need for a compare op.
>
> struct sync_timeline_ops {
> 	const char *driver_name;
>
> 	/* required */
> 	struct sync_pt *(*dup)(struct sync_pt *pt);
>
> 	/* required */
> 	int (*has_signaled)(struct sync_pt *pt);
>
> 	/* required */
> 	int (*compare)(struct sync_pt *a, struct sync_pt *b);
>
> 	/* optional */
> 	void (*free_pt)(struct sync_pt *sync_pt);
>
> 	/* optional */
> 	void (*release_obj)(struct sync_timeline *sync_timeline);
>
> 	/* deprecated */
> 	void (*print_obj)(struct seq_file *s,
> 			  struct sync_timeline *sync_timeline);
>
> 	/* deprecated */
> 	void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);
>
> 	/* optional */
> 	int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);
>
> 	/* optional */
> 	void (*timeline_value_str)(struct sync_timeline *timeline, char *str,
> 				   int size);
>
> 	/* optional */
> 	void (*pt_value_str)(struct sync_pt *pt, char *str, int size);
> };
>
> The dup is weird, I have nothing like that. I do allow multiple callbacks to be added to the same
> dma-fence, and allow callbacks to be aborted, provided you still hold a refcount.
>
> So from the ops it looks like what's mostly missing is print_pt, fill_driver_data,
> timeline_value_str and pt_value_str.
>
> I have no idea how much of this is inaccurate. This is just an assessment based on me looking at
> the stuff in drivers/staging/android/sync for an afternoon and the earlier discussions. :)
>
So I actually tried to implement it now. I killed all the deprecated members and assumed a linear timeline.
This means that syncpoints can only be added at the end, not in between. In particular it means sw_sync
might be slightly broken.

I only tested it with a simple program I wrote called ufence.c, it's in drivers/staging/android/ufence.c in the following tree:

http://cgit.freedesktop.org/~mlankhorst/linux

the "rfc: convert android to fence api" has all the changes from my dma-fence proposal to what android would need,
it also converts the userspace fence api to use the dma-fence api.

sync_pt is implemented as fence too. This meant not having to convert all of android right away, though I did make some changes.
I killed the deprecated members and made all the fence calls forward to the sync_timeline_ops. dup and compare are no longer used.

I haven't given this a spin on a full android kernel, only with the components that are in mainline kernel under staging and my dumb test program.

~Maarten

PS: The nomenclature is very confusing. I want to rename dma-fence to syncpoint, but I want some feedback from the android devs first. :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-24 12:13       ` Maarten Lankhorst
@ 2013-10-30 12:17         ` Maarten Lankhorst
  2013-11-01 21:03           ` Rom Lemarchand
  2013-11-02 21:36           ` Colin Cross
  0 siblings, 2 replies; 18+ messages in thread
From: Maarten Lankhorst @ 2013-10-30 12:17 UTC (permalink / raw)
  To: John Stultz, Erik Gilling; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

op 24-10-13 14:13, Maarten Lankhorst schreef:
> op 09-10-13 16:39, Maarten Lankhorst schreef:
>> Hey,
>>
>>  op 08-10-13 19:37, John Stultz schreef:
>>> On Wed, Oct 2, 2013 at 11:13 AM, Erik Gilling <konkers@android.com> wrote:
>>>> On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
>>>> <maarten.lankhorst@canonical.com> wrote:
>>>>> Depending on feedback I'll try reflashing my nexus 7 to stock android, and work on trying to convert android
>>>>> syncpoints to dma-fence, which I'll probably rename to syncpoints.
>>>> I thought the plan decided at plumbers was to investigate backing
>>>> dma_buf with the android sync solution not the other way around.  It
>>>> doesn't make sense to me to take a working, tested, end-to-end
>>>> solution with a released compositing system built around it, throw it
>>>> out, and replace it with new un-tested code to
>>>> support a system which is not yet built.
>>> Hey Erik,
>>>   Thanks for the clarifying points in your email, your insights and
>>> feedback are critical, and I think having you and Maarten continue to
>>> work out the details here will make this productive.
>>>
>>> My recollection from the discussion was that Rob was ok with trying to
>>> pipe the sync arguments through the various interfaces in order to
>>> support the explicit sync, but I think he did suggest having it backed
>>> by the dma-buf fences underneath.
>>>
>>> I know this can be frustrating to watch things be reimplemented when
>>> you have a pre-baked solution, but some compromise will be needed to
>>> get things merged (and Maarten is taking the initiative here), but its
>>> important to keep discussing this so the *right* compromises are made
>>> that don't hurt performance, etc.
>>>
>>> My hope is Maarten's approach of getting the dma-fence core
>>> integrated, and then moving the existing Android sync interface over
>>> to the shared back-end, will allow for proper apples-to-apples
>>> comparisons of the same interface. And if the functionality isn't
>>> sufficient we can hold off on merging the sync interface conversion
>>> until that gets resolved.
>>>
>> Yeah, I'm trying to understand the android side too. I think a unified interface would benefit both. I'm
>> toying a bit with the sw_sync driver in staging because it's the easiest to try out on my desktop.
>>
>> The timeline stuff looks like it could be simplified. The main difference that there seems to be is that
>> I didn't want to create a separate timeline struct for synchronization but let the drivers handle it.
>>
>> If you use rcu for reference lifetime management of timeline, the kref can be dropped. Signalling all
>> syncpts on timeline destroy to a new destroyed state would kill the need for a destroyed member.
>> The active list is unneeded and can be killed if only a linear progression of child_list is allowed.
>>
>> Which probably leaves this nice structure:
>> struct sync_timeline {
>>     const struct sync_timeline_ops    *ops;
>>     char            name[32];
>>
>>     struct list_head    child_list_head;
>>     spinlock_t        child_list_lock;
>>
>>     struct list_head    sync_timeline_list;
>> };
>>
>> Where name, and sync_timeline_list are nice for debugging, but I guess not necessarily required. so that
>> could be split out into a separate debugfs thing if required. I've moved the pointer to ops to the fence
>> for dma-fence, which leaves this..
>>
>> struct sync_timeline {
>>     struct list_head    child_list_head;
>>     spinlock_t        child_list_lock;
>>
>>     struct  sync_timeline_debug {
>>         struct list_head    sync_timeline_list;
>>         char name[32];
>>     };
>> };
>>
>> Hm, this looks familiar, the drm drivers had some structure for protecting the active fence list that has
>> an identical definition, but with a slightly different list offset..
>>
>> struct __wait_queue_head {
>>     spinlock_t lock;
>>     struct list_head task_list;
>> };
>>
>> typedef struct __wait_queue_head wait_queue_head_t;
>>
>> This is nicer to convert the existing drm drivers, which already implement synchronous wait with a waitqueue.
>> The default wait op is in fact
>>
>> Ok enough of this little excercise. I just wanted to see how different the 2 are. I think even if the
>> fence interface will end up being incompatible it wouldn't be too hard to convert android..
>>
>> Main difference is the ops, android has a lot more than what I used for dma-fence:
>>
>> struct fence_ops {
>> 	bool (*enable_signaling)(struct fence *fence); // required, callback called with fence->lock held,
>> 	// fence->lock is a pointer passed to __fence_init. Callback should make sure that the fence will
>> 	// be signaled asap.
>> 	bool (*signaled)(struct fence *fence); // optional, but if set to NULL fence_is_signaled is not
>> 	// required to ever return true, unless enable_signaling is called, similar to has_signaled
>> 	long (*wait)(struct fence *fence, bool intr, signed long timeout); // required, but it can be set
>> 	// to the default fence_default_wait implementation which is recommended. It calls enable_signaling
>> 	// and appends itself to async callback list. Identical semantics to wait_event_interruptible_timeout.
>> 	void (*release)(struct fence *fence); // free_pt
>> };
>>
>> Because every fence has a stamp, there is no need for a compare op.
>>
>> struct sync_timeline_ops {
>> 	const char *driver_name;
>>
>> 	/* required */
>> 	struct sync_pt *(*dup)(struct sync_pt *pt);
>>
>> 	/* required */
>> 	int (*has_signaled)(struct sync_pt *pt);
>>
>> 	/* required */
>> 	int (*compare)(struct sync_pt *a, struct sync_pt *b);
>>
>> 	/* optional */
>> 	void (*free_pt)(struct sync_pt *sync_pt);
>>
>> 	/* optional */
>> 	void (*release_obj)(struct sync_timeline *sync_timeline);
>>
>> 	/* deprecated */
>> 	void (*print_obj)(struct seq_file *s,
>> 			  struct sync_timeline *sync_timeline);
>>
>> 	/* deprecated */
>> 	void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);
>>
>> 	/* optional */
>> 	int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int size);
>>
>> 	/* optional */
>> 	void (*timeline_value_str)(struct sync_timeline *timeline, char *str,
>> 				   int size);
>>
>> 	/* optional */
>> 	void (*pt_value_str)(struct sync_pt *pt, char *str, int size);
>> };
>>
>> The dup is weird, I have nothing like that. I do allow multiple callbacks to be added to the same
>> dma-fence, and allow callbacks to be aborted, provided you still hold a refcount.
>>
>> So from the ops it looks like what's mostly missing is print_pt, fill_driver_data,
>> timeline_value_str and pt_value_str.
>>
>> I have no idea how much of this is inaccurate. This is just an assessment based on me looking at
>> the stuff in drivers/staging/android/sync for an afternoon and the earlier discussions. :)
>>
> So I actually tried to implement it now. I killed all the deprecated members and assumed a linear timeline.
> This means that syncpoints can only be added at the end, not in between. In particular it means sw_sync
> might be slightly broken.
>
> I only tested it with a simple program I wrote called ufence.c, it's in drivers/staging/android/ufence.c in the following tree:
>
> http://cgit.freedesktop.org/~mlankhorst/linux
>
> the "rfc: convert android to fence api" has all the changes from my dma-fence proposal to what android would need,
> it also converts the userspace fence api to use the dma-fence api.
>
> sync_pt is implemented as fence too. This meant not having to convert all of android right away, though I did make some changes.
> I killed the deprecated members and made all the fence calls forward to the sync_timeline_ops. dup and compare are no longer used.
>
> I haven't given this a spin on a full android kernel, only with the components that are in mainline kernel under staging and my dumb test program.
>
> ~Maarten
>
> PS: The nomenclature is very confusing. I want to rename dma-fence to syncpoint, but I want some feedback from the android devs first. :)
>
Come on, any feedback? I want to move the discussion forward.

~Maarten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-30 12:17         ` Maarten Lankhorst
@ 2013-11-01 21:03           ` Rom Lemarchand
  2013-11-02 21:36           ` Colin Cross
  1 sibling, 0 replies; 18+ messages in thread
From: Rom Lemarchand @ 2013-11-01 21:03 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 9506 bytes --]

Sorry about the delay.
Hopefully other people from Android will also chime in.
We need the ability to merge sync fences and keep the sync_pt ordered: the
idea behind sync timelines is that we promise an ordering of operations.

Our reference device is Nexus 10: we need to make sure that any new
implementation satisfies the same requirements.

You can find sample use-cases here, we also use it in our hardware composer
libraries:
https://android.googlesource.com/platform/system/core/+/refs/heads/master/libsync/
https://android.googlesource.com/platform/frameworks/native/+/master/libs/ui/Fence.cpp



On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst <
maarten.lankhorst@canonical.com> wrote:

> op 24-10-13 14:13, Maarten Lankhorst schreef:
> > op 09-10-13 16:39, Maarten Lankhorst schreef:
> >> Hey,
> >>
> >>  op 08-10-13 19:37, John Stultz schreef:
> >>> On Wed, Oct 2, 2013 at 11:13 AM, Erik Gilling <konkers@android.com>
> wrote:
> >>>> On Wed, Oct 2, 2013 at 12:35 AM, Maarten Lankhorst
> >>>> <maarten.lankhorst@canonical.com> wrote:
> >>>>> Depending on feedback I'll try reflashing my nexus 7 to stock
> android, and work on trying to convert android
> >>>>> syncpoints to dma-fence, which I'll probably rename to syncpoints.
> >>>> I thought the plan decided at plumbers was to investigate backing
> >>>> dma_buf with the android sync solution not the other way around.  It
> >>>> doesn't make sense to me to take a working, tested, end-to-end
> >>>> solution with a released compositing system built around it, throw it
> >>>> out, and replace it with new un-tested code to
> >>>> support a system which is not yet built.
> >>> Hey Erik,
> >>>   Thanks for the clarifying points in your email, your insights and
> >>> feedback are critical, and I think having you and Maarten continue to
> >>> work out the details here will make this productive.
> >>>
> >>> My recollection from the discussion was that Rob was ok with trying to
> >>> pipe the sync arguments through the various interfaces in order to
> >>> support the explicit sync, but I think he did suggest having it backed
> >>> by the dma-buf fences underneath.
> >>>
> >>> I know this can be frustrating to watch things be reimplemented when
> >>> you have a pre-baked solution, but some compromise will be needed to
> >>> get things merged (and Maarten is taking the initiative here), but its
> >>> important to keep discussing this so the *right* compromises are made
> >>> that don't hurt performance, etc.
> >>>
> >>> My hope is Maarten's approach of getting the dma-fence core
> >>> integrated, and then moving the existing Android sync interface over
> >>> to the shared back-end, will allow for proper apples-to-apples
> >>> comparisons of the same interface. And if the functionality isn't
> >>> sufficient we can hold off on merging the sync interface conversion
> >>> until that gets resolved.
> >>>
> >> Yeah, I'm trying to understand the android side too. I think a unified
> interface would benefit both. I'm
> >> toying a bit with the sw_sync driver in staging because it's the
> easiest to try out on my desktop.
> >>
> >> The timeline stuff looks like it could be simplified. The main
> difference that there seems to be is that
> >> I didn't want to create a separate timeline struct for synchronization
> but let the drivers handle it.
> >>
> >> If you use rcu for reference lifetime management of timeline, the kref
> can be dropped. Signalling all
> >> syncpts on timeline destroy to a new destroyed state would kill the
> need for a destroyed member.
> >> The active list is unneeded and can be killed if only a linear
> progression of child_list is allowed.
> >>
> >> Which probably leaves this nice structure:
> >> struct sync_timeline {
> >>     const struct sync_timeline_ops    *ops;
> >>     char            name[32];
> >>
> >>     struct list_head    child_list_head;
> >>     spinlock_t        child_list_lock;
> >>
> >>     struct list_head    sync_timeline_list;
> >> };
> >>
> >> Where name, and sync_timeline_list are nice for debugging, but I guess
> not necessarily required. so that
> >> could be split out into a separate debugfs thing if required. I've
> moved the pointer to ops to the fence
> >> for dma-fence, which leaves this..
> >>
> >> struct sync_timeline {
> >>     struct list_head    child_list_head;
> >>     spinlock_t        child_list_lock;
> >>
> >>     struct  sync_timeline_debug {
> >>         struct list_head    sync_timeline_list;
> >>         char name[32];
> >>     };
> >> };
> >>
> >> Hm, this looks familiar, the drm drivers had some structure for
> protecting the active fence list that has
> >> an identical definition, but with a slightly different list offset..
> >>
> >> struct __wait_queue_head {
> >>     spinlock_t lock;
> >>     struct list_head task_list;
> >> };
> >>
> >> typedef struct __wait_queue_head wait_queue_head_t;
> >>
> >> This is nicer to convert the existing drm drivers, which already
> implement synchronous wait with a waitqueue.
> >> The default wait op is in fact
> >>
> >> Ok enough of this little excercise. I just wanted to see how different
> the 2 are. I think even if the
> >> fence interface will end up being incompatible it wouldn't be too hard
> to convert android..
> >>
> >> Main difference is the ops, android has a lot more than what I used for
> dma-fence:
> >>
> >> struct fence_ops {
> >>      bool (*enable_signaling)(struct fence *fence); // required,
> callback called with fence->lock held,
> >>      // fence->lock is a pointer passed to __fence_init. Callback
> should make sure that the fence will
> >>      // be signaled asap.
> >>      bool (*signaled)(struct fence *fence); // optional, but if set to
> NULL fence_is_signaled is not
> >>      // required to ever return true, unless enable_signaling is
> called, similar to has_signaled
> >>      long (*wait)(struct fence *fence, bool intr, signed long timeout);
> // required, but it can be set
> >>      // to the default fence_default_wait implementation which is
> recommended. It calls enable_signaling
> >>      // and appends itself to async callback list. Identical semantics
> to wait_event_interruptible_timeout.
> >>      void (*release)(struct fence *fence); // free_pt
> >> };
> >>
> >> Because every fence has a stamp, there is no need for a compare op.
> >>
> >> struct sync_timeline_ops {
> >>      const char *driver_name;
> >>
> >>      /* required */
> >>      struct sync_pt *(*dup)(struct sync_pt *pt);
> >>
> >>      /* required */
> >>      int (*has_signaled)(struct sync_pt *pt);
> >>
> >>      /* required */
> >>      int (*compare)(struct sync_pt *a, struct sync_pt *b);
> >>
> >>      /* optional */
> >>      void (*free_pt)(struct sync_pt *sync_pt);
> >>
> >>      /* optional */
> >>      void (*release_obj)(struct sync_timeline *sync_timeline);
> >>
> >>      /* deprecated */
> >>      void (*print_obj)(struct seq_file *s,
> >>                        struct sync_timeline *sync_timeline);
> >>
> >>      /* deprecated */
> >>      void (*print_pt)(struct seq_file *s, struct sync_pt *sync_pt);
> >>
> >>      /* optional */
> >>      int (*fill_driver_data)(struct sync_pt *syncpt, void *data, int
> size);
> >>
> >>      /* optional */
> >>      void (*timeline_value_str)(struct sync_timeline *timeline, char
> *str,
> >>                                 int size);
> >>
> >>      /* optional */
> >>      void (*pt_value_str)(struct sync_pt *pt, char *str, int size);
> >> };
> >>
> >> The dup is weird, I have nothing like that. I do allow multiple
> callbacks to be added to the same
> >> dma-fence, and allow callbacks to be aborted, provided you still hold a
> refcount.
> >>
> >> So from the ops it looks like what's mostly missing is print_pt,
> fill_driver_data,
> >> timeline_value_str and pt_value_str.
> >>
> >> I have no idea how much of this is inaccurate. This is just an
> assessment based on me looking at
> >> the stuff in drivers/staging/android/sync for an afternoon and the
> earlier discussions. :)
> >>
> > So I actually tried to implement it now. I killed all the deprecated
> members and assumed a linear timeline.
> > This means that syncpoints can only be added at the end, not in between.
> In particular it means sw_sync
> > might be slightly broken.
> >
> > I only tested it with a simple program I wrote called ufence.c, it's in
> drivers/staging/android/ufence.c in the following tree:
> >
> > http://cgit.freedesktop.org/~mlankhorst/linux
> >
> > the "rfc: convert android to fence api" has all the changes from my
> dma-fence proposal to what android would need,
> > it also converts the userspace fence api to use the dma-fence api.
> >
> > sync_pt is implemented as fence too. This meant not having to convert
> all of android right away, though I did make some changes.
> > I killed the deprecated members and made all the fence calls forward to
> the sync_timeline_ops. dup and compare are no longer used.
> >
> > I haven't given this a spin on a full android kernel, only with the
> components that are in mainline kernel under staging and my dumb test
> program.
> >
> > ~Maarten
> >
> > PS: The nomenclature is very confusing. I want to rename dma-fence to
> syncpoint, but I want some feedback from the android devs first. :)
> >
> Come on, any feedback? I want to move the discussion forward.
>
> ~Maarten
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>

[-- Attachment #1.2: Type: text/html, Size: 12257 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-10-30 12:17         ` Maarten Lankhorst
  2013-11-01 21:03           ` Rom Lemarchand
@ 2013-11-02 21:36           ` Colin Cross
  2013-11-03  6:31             ` Maarten Lankhorst
                               ` (2 more replies)
  1 sibling, 3 replies; 18+ messages in thread
From: Colin Cross @ 2013-11-02 21:36 UTC (permalink / raw)
  To: Maarten Lankhorst; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

[-- Attachment #1: Type: text/plain, Size: 3055 bytes --]

On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst
<maarten.lankhorst@canonical.com> wrote:
> op 24-10-13 14:13, Maarten Lankhorst schreef:
>> So I actually tried to implement it now. I killed all the deprecated members and assumed a linear timeline.
>> This means that syncpoints can only be added at the end, not in between. In particular it means sw_sync
>> might be slightly broken.
>>
>> I only tested it with a simple program I wrote called ufence.c, it's in drivers/staging/android/ufence.c in the following tree:
>>
>> http://cgit.freedesktop.org/~mlankhorst/linux
>>
>> the "rfc: convert android to fence api" has all the changes from my dma-fence proposal to what android would need,
>> it also converts the userspace fence api to use the dma-fence api.
>>
>> sync_pt is implemented as fence too. This meant not having to convert all of android right away, though I did make some changes.
>> I killed the deprecated members and made all the fence calls forward to the sync_timeline_ops. dup and compare are no longer used.
>>
>> I haven't given this a spin on a full android kernel, only with the components that are in mainline kernel under staging and my dumb test program.
>>
>> ~Maarten
>>
>> PS: The nomenclature is very confusing. I want to rename dma-fence to syncpoint, but I want some feedback from the android devs first. :)
>>
> Come on, any feedback? I want to move the discussion forward.
>
> ~Maarten

I experimented with it a little on a device that uses sync and came
across a few bugs:
1.  sync_timeline_signal needs to call __fence_signal on all signaled
points on the timeline, not just the first
2.  fence_add_callback doesn't always initialize cb.node
3.  sync_fence_wait should take ms
4.  sync_print_pt status printing was incorrect
5.  there is a deadlock:
   sync_print_obj takes obj->child_list_lock
   sync_print_pt
   fence_is_signaled
   fence_signal takes fence->lock == obj->child_list_lock
6.  freeing a timeline before all the fences holding points on that
timeline have timed out results in a crash

With the attached patch to fix these issues, our libsync and sync_test
give the same results as with our sync code.  I haven't tested against
the full Android framework yet.

The compare op and timeline ordering is critical to the efficiency of
sync points on Android.  The compare op is used when merging fences to
drop all but the latest point on the same timeline.  This is necessary
for example when the same buffer is submitted to the display on
multiple frames, like when there is a live wallpaper in the background
updating at 60 fps and a static screen of widgets on top of it.  The
static widget buffer is submitted on every frame, returning a new
fence each time.  The compositor merges the new fence with the fence
for the previous buffer, and because they are on the same timeline it
merges down to a single point.  I experimented with disabling the
merge optimization on a Nexus 10, and found that leaving the screen on
running a live wallpaper eventually resulted in 100k outstanding sync
points.

[-- Attachment #2: 0001-dma-fence-fixes.patch --]
[-- Type: text/x-patch, Size: 4963 bytes --]

From e8f89ae535e227490d3116a03f2be5dc780e5be9 Mon Sep 17 00:00:00 2001
From: Colin Cross <ccross@android.com>
Date: Fri, 1 Nov 2013 19:26:54 -0700
Subject: [PATCH] dma fence fixes

Change-Id: I9308b51c23e762fde95106db301cf6aedd8845f5
---
 drivers/staging/android/sync.c       | 27 +++++++++++++++++++++------
 drivers/staging/android/sync_debug.c | 10 ++++------
 include/linux/fence.h                |  4 ++--
 3 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 110a9e99cb71..672020d5f566 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -74,6 +74,16 @@ static void sync_timeline_free(struct kref *kref)
 	kfree(obj);
 }
 
+static void sync_timeline_get(struct sync_timeline *obj)
+{
+	kref_get(&obj->kref);
+}
+
+static void sync_timeline_put(struct sync_timeline *obj)
+{
+	kref_put(&obj->kref, sync_timeline_free);
+}
+
 void sync_timeline_destroy(struct sync_timeline *obj)
 {
 	obj->destroyed = true;
@@ -83,8 +93,8 @@ void sync_timeline_destroy(struct sync_timeline *obj)
 	 * that their parent is going away.
 	 */
 
-	if (!kref_put(&obj->kref, sync_timeline_free))
-		sync_timeline_signal(obj);
+	sync_timeline_signal(obj);
+	sync_timeline_put(obj);
 }
 EXPORT_SYMBOL(sync_timeline_destroy);
 
@@ -98,9 +108,7 @@ void sync_timeline_signal(struct sync_timeline *obj)
 
 	spin_lock_irqsave(&obj->child_list_lock, flags);
 	list_for_each_entry_safe(pt, next, &obj->active_list_head, active_list) {
-		if (!pt->base.ops->signaled(&pt->base))
-			break;
-		else {
+		if (pt->base.ops->signaled(&pt->base)) {
 			__fence_signal(&pt->base);
 			list_del(&pt->active_list);
 		}
@@ -122,6 +130,7 @@ struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size)
 		return NULL;
 
 	spin_lock_irqsave(&obj->child_list_lock, flags);
+	sync_timeline_get(obj);
 	__fence_init(&pt->base, &android_fence_ops, &obj->child_list_lock, obj->context, ++obj->value);
 	list_add_tail(&pt->child_list, &obj->child_list_head);
 	INIT_LIST_HEAD(&pt->active_list);
@@ -186,6 +195,7 @@ struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
 	fence_get(&pt->base);
 	fence->cbs[0].sync_pt = &pt->base;
 	fence->cbs[0].fence = fence;
+	INIT_LIST_HEAD(&fence->cbs[0].cb.node);
 	if (fence_add_callback(&pt->base, &fence->cbs[0].cb, fence_check_cb_func))
 		atomic_dec(&fence->status);
 
@@ -245,6 +255,7 @@ struct sync_fence *sync_fence_merge(const char *name,
 		fence_get(pt);
 		fence->cbs[i].sync_pt = pt;
 		fence->cbs[i].fence = fence;
+		INIT_LIST_HEAD(&fence->cbs[i].cb.node);
 		if (fence_add_callback(pt, &fence->cbs[i].cb, fence_check_cb_func))
 			atomic_dec(&fence->status);
 	}
@@ -255,7 +266,8 @@ struct sync_fence *sync_fence_merge(const char *name,
 		fence_get(pt);
 		fence->cbs[a->num_fences + i].sync_pt = pt;
 		fence->cbs[a->num_fences + i].fence = fence;
-		if (fence_add_callback(pt, &fence->cbs[i].cb, fence_check_cb_func))
+		INIT_LIST_HEAD(&fence->cbs[a->num_fences + i].cb.node);
+		if (fence_add_callback(pt, &fence->cbs[a->num_fences + i].cb, fence_check_cb_func))
 			atomic_dec(&fence->status);
 	}
 
@@ -325,6 +337,8 @@ int sync_fence_wait(struct sync_fence *fence, long timeout)
 
 	if (timeout < 0)
 		timeout = MAX_SCHEDULE_TIMEOUT;
+	else
+		timeout = msecs_to_jiffies(timeout);
 
 	trace_sync_wait(fence, 1);
 	for (i = 0; i < fence->num_fences; ++i)
@@ -383,6 +397,7 @@ static void android_fence_release(struct fence *fence)
 	if (parent->ops->free_pt)
 		parent->ops->free_pt(pt);
 
+	sync_timeline_put(parent);
 	kfree(pt);
 }
 
diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c
index 55ad34085f2f..8671fbb7d143 100644
--- a/drivers/staging/android/sync_debug.c
+++ b/drivers/staging/android/sync_debug.c
@@ -82,18 +82,16 @@ static const char *sync_status_str(int status)
 
 static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
 {
-	int status = 0;
+	int status = 1;
 	struct sync_timeline *parent = sync_pt_parent(pt);
-	if (fence_is_signaled(&pt->base)) {
+	if (fence_is_signaled(&pt->base))
 		status = pt->base.status;
-		if (!status)
-			status = 1;
-	}
+
 	seq_printf(s, "  %s%spt %s",
 		   fence ? parent->name : "",
 		   fence ? "_" : "",
 		   sync_status_str(status));
-	if (status) {
+	if (!status) {
 		struct timeval tv = ktime_to_timeval(pt->base.timestamp);
 		seq_printf(s, "@%ld.%06ld", tv.tv_sec, tv.tv_usec);
 	}
diff --git a/include/linux/fence.h b/include/linux/fence.h
index 2beb3b0ff2a3..e1b2fc6e8007 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -257,10 +257,10 @@ fence_is_signaled(struct fence *fence)
 	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		return true;
 
-	if (fence->ops->signaled && fence->ops->signaled(fence)) {
+	/*if (fence->ops->signaled && fence->ops->signaled(fence)) {
 		fence_signal(fence);
 		return true;
-	}
+	}*/
 
 	return false;
 }
-- 
1.8.4.1


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-02 21:36           ` Colin Cross
@ 2013-11-03  6:31             ` Maarten Lankhorst
  2013-11-04  9:36             ` Maarten Lankhorst
  2013-11-04 10:31             ` Maarten Lankhorst
  2 siblings, 0 replies; 18+ messages in thread
From: Maarten Lankhorst @ 2013-11-03  6:31 UTC (permalink / raw)
  To: Colin Cross; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

op 02-11-13 22:36, Colin Cross schreef:
> On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst
> <maarten.lankhorst@canonical.com> wrote:
>> op 24-10-13 14:13, Maarten Lankhorst schreef:
>>> So I actually tried to implement it now. I killed all the deprecated members and assumed a linear timeline.
>>> This means that syncpoints can only be added at the end, not in between. In particular it means sw_sync
>>> might be slightly broken.
>>>
>>> I only tested it with a simple program I wrote called ufence.c, it's in drivers/staging/android/ufence.c in the following tree:
>>>
>>> http://cgit.freedesktop.org/~mlankhorst/linux
>>>
>>> the "rfc: convert android to fence api" has all the changes from my dma-fence proposal to what android would need,
>>> it also converts the userspace fence api to use the dma-fence api.
>>>
>>> sync_pt is implemented as fence too. This meant not having to convert all of android right away, though I did make some changes.
>>> I killed the deprecated members and made all the fence calls forward to the sync_timeline_ops. dup and compare are no longer used.
>>>
>>> I haven't given this a spin on a full android kernel, only with the components that are in mainline kernel under staging and my dumb test program.
>>>
>>> ~Maarten
>>>
>>> PS: The nomenclature is very confusing. I want to rename dma-fence to syncpoint, but I want some feedback from the android devs first. :)
>>>
>> Come on, any feedback? I want to move the discussion forward.
>>
>> ~Maarten
> I experimented with it a little on a device that uses sync and came
> across a few bugs:
> 1.  sync_timeline_signal needs to call __fence_signal on all signaled
> points on the timeline, not just the first
> 2.  fence_add_callback doesn't always initialize cb.node
> 3.  sync_fence_wait should take ms
> 4.  sync_print_pt status printing was incorrect
Well, in the normal case status may not be set, but fence may still be signaled. Any status should be set before signaling. In cases where non-android dma-fences are used.
I deliberately tried not to depend on any android stuff in the android fences, this way it could be used outside android too.

I'm trying to only set status on error. If there is a race with 2 threads calling fence_signal, one with error, one without, then the error will always be visible in fence->status that way.
But I guess this might be a corner case we wouldn't worry about normally...
> 5.  there is a deadlock:
>    sync_print_obj takes obj->child_list_lock
>    sync_print_pt
>    fence_is_signaled
>    fence_signal takes fence->lock == obj->child_list_lock
> 6.  freeing a timeline before all the fences holding points on that
> timeline have timed out results in a crash
To be honest, I was surprised this code mostly worked for the sw_fence test code I wrote originally. :)
I think I'll have to add a __fence_is_signaled for 5, which can be called with fence->lock held.


>
> With the attached patch to fix these issues, our libsync and sync_test
> give the same results as with our sync code.  I haven't tested against
> the full Android framework yet.
>
> The compare op and timeline ordering is critical to the efficiency of
> sync points on Android.  The compare op is used when merging fences to
> drop all but the latest point on the same timeline.  This is necessary
> for example when the same buffer is submitted to the display on
> multiple frames, like when there is a live wallpaper in the background
> updating at 60 fps and a static screen of widgets on top of it.  The
> static widget buffer is submitted on every frame, returning a new
> fence each time.  The compositor merges the new fence with the fence
> for the previous buffer, and because they are on the same timeline it
> merges down to a single point.  I experimented with disabling the
> merge optimization on a Nexus 10, and found that leaving the screen on
> running a live wallpaper eventually resulted in 100k outstanding sync
> points.
Yeah I've been looking into how to do that. It's very easy to optimize actually. The dma-fence code requires a context,
which is a number (even if it may become a pointer later, it could be seen as a number). If we order all fences based on number,
fences could have an order based on context number. Merging fences would simply become adding 2 ordered lists, and dropping
any duplicates and signaled points.

I left it out for the RFC I wanted to keep things as readable as possible. :)

~Maarten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-02 21:36           ` Colin Cross
  2013-11-03  6:31             ` Maarten Lankhorst
@ 2013-11-04  9:36             ` Maarten Lankhorst
  2013-11-04 10:31             ` Maarten Lankhorst
  2 siblings, 0 replies; 18+ messages in thread
From: Maarten Lankhorst @ 2013-11-04  9:36 UTC (permalink / raw)
  To: Colin Cross; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

op 02-11-13 22:36, Colin Cross schreef:
> On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst
> <maarten.lankhorst@canonical.com> wrote:
>> op 24-10-13 14:13, Maarten Lankhorst schreef:
>>> So I actually tried to implement it now. I killed all the deprecated members and assumed a linear timeline.
>>> This means that syncpoints can only be added at the end, not in between. In particular it means sw_sync
>>> might be slightly broken.
>>>
>>> I only tested it with a simple program I wrote called ufence.c, it's in drivers/staging/android/ufence.c in the following tree:
>>>
>>> http://cgit.freedesktop.org/~mlankhorst/linux
>>>
>>> the "rfc: convert android to fence api" has all the changes from my dma-fence proposal to what android would need,
>>> it also converts the userspace fence api to use the dma-fence api.
>>>
>>> sync_pt is implemented as fence too. This meant not having to convert all of android right away, though I did make some changes.
>>> I killed the deprecated members and made all the fence calls forward to the sync_timeline_ops. dup and compare are no longer used.
>>>
>>> I haven't given this a spin on a full android kernel, only with the components that are in mainline kernel under staging and my dumb test program.
>>>
>>> ~Maarten
>>>
>>> PS: The nomenclature is very confusing. I want to rename dma-fence to syncpoint, but I want some feedback from the android devs first. :)
>>>
>> Come on, any feedback? I want to move the discussion forward.
>>
>> ~Maarten
> I experimented with it a little on a device that uses sync and came
> across a few bugs:
> 1.  sync_timeline_signal needs to call __fence_signal on all signaled
> points on the timeline, not just the first
> 2.  fence_add_callback doesn't always initialize cb.node
> 3.  sync_fence_wait should take ms
> 4.  sync_print_pt status printing was incorrect
> 5.  there is a deadlock:
>    sync_print_obj takes obj->child_list_lock
>    sync_print_pt
>    fence_is_signaled
>    fence_signal takes fence->lock == obj->child_list_lock
> 6.  freeing a timeline before all the fences holding points on that
> timeline have timed out results in a crash
>
> With the attached patch to fix these issues, our libsync and sync_test
> give the same results as with our sync code.  I haven't tested against
> the full Android framework yet.
>
> The compare op and timeline ordering is critical to the efficiency of
> sync points on Android.  The compare op is used when merging fences to
> drop all but the latest point on the same timeline.  This is necessary
> for example when the same buffer is submitted to the display on
> multiple frames, like when there is a live wallpaper in the background
> updating at 60 fps and a static screen of widgets on top of it.  The
> static widget buffer is submitted on every frame, returning a new
> fence each time.  The compositor merges the new fence with the fence
> for the previous buffer, and because they are on the same timeline it
> merges down to a single point.  I experimented with disabling the
> merge optimization on a Nexus 10, and found that leaving the screen on
> running a live wallpaper eventually resulted in 100k outstanding sync
> points.

Hey,


fence_add_callback will now always initialize cb->node, even on failure.
I added __fence_is_signaled, to be used with the lock held.
sync_print_pt didn't work when the fence was signaled with an error, I fixed that.

So I reworked patch below, no merge optimization yet. It will be done as a separate patch. :)

---
diff --git a/drivers/base/fence.c b/drivers/base/fence.c
index 89c89ae19f58..9e7a63c4b07f 100644
--- a/drivers/base/fence.c
+++ b/drivers/base/fence.c
@@ -185,8 +185,10 @@ int fence_add_callback(struct fence *fence, struct fence_cb *cb,
 	if (WARN_ON(!fence || !func))
 		return -EINVAL;
 
-	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags)) {
+		LIST_INIT_HEAD(&cb->node);
 		return -ENOENT;
+	}
 
 	spin_lock_irqsave(fence->lock, flags);
 
@@ -202,7 +204,8 @@ int fence_add_callback(struct fence *fence, struct fence_cb *cb,
 	if (!ret) {
 		cb->func = func;
 		list_add_tail(&cb->node, &fence->cb_list);
-	}
+	} else
+		LIST_INIT_HEAD(&cb->node);
 	spin_unlock_irqrestore(fence->lock, flags);
 
 	return ret;
diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 110a9e99cb71..2c7fd3f2ab23 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -74,6 +74,16 @@ static void sync_timeline_free(struct kref *kref)
 	kfree(obj);
 }
 
+static void sync_timeline_get(struct sync_timeline *obj)
+{
+	kref_get(&obj->kref);
+}
+
+static void sync_timeline_put(struct sync_timeline *obj)
+{
+	kref_put(&obj->kref, sync_timeline_free);
+}
+
 void sync_timeline_destroy(struct sync_timeline *obj)
 {
 	obj->destroyed = true;
@@ -83,8 +93,8 @@ void sync_timeline_destroy(struct sync_timeline *obj)
 	 * that their parent is going away.
 	 */
 
-	if (!kref_put(&obj->kref, sync_timeline_free))
-		sync_timeline_signal(obj);
+	sync_timeline_signal(obj);
+	sync_timeline_put(obj);
 }
 EXPORT_SYMBOL(sync_timeline_destroy);
 
@@ -98,12 +108,8 @@ void sync_timeline_signal(struct sync_timeline *obj)
 
 	spin_lock_irqsave(&obj->child_list_lock, flags);
 	list_for_each_entry_safe(pt, next, &obj->active_list_head, active_list) {
-		if (!pt->base.ops->signaled(&pt->base))
-			break;
-		else {
-			__fence_signal(&pt->base);
+		if (__fence_is_signaled(&pt->base))
 			list_del(&pt->active_list);
-		}
 	}
 	spin_unlock_irqrestore(&obj->child_list_lock, flags);
 }
@@ -122,6 +128,7 @@ struct sync_pt *sync_pt_create(struct sync_timeline *obj, int size)
 		return NULL;
 
 	spin_lock_irqsave(&obj->child_list_lock, flags);
+	sync_timeline_get(obj);
 	__fence_init(&pt->base, &android_fence_ops, &obj->child_list_lock, obj->context, ++obj->value);
 	list_add_tail(&pt->child_list, &obj->child_list_head);
 	INIT_LIST_HEAD(&pt->active_list);
@@ -255,7 +262,7 @@ struct sync_fence *sync_fence_merge(const char *name,
 		fence_get(pt);
 		fence->cbs[a->num_fences + i].sync_pt = pt;
 		fence->cbs[a->num_fences + i].fence = fence;
-		if (fence_add_callback(pt, &fence->cbs[i].cb, fence_check_cb_func))
+		if (fence_add_callback(pt, &fence->cbs[a->num_fences + i].cb, fence_check_cb_func))
 			atomic_dec(&fence->status);
 	}
 
@@ -325,6 +332,8 @@ int sync_fence_wait(struct sync_fence *fence, long timeout)
 
 	if (timeout < 0)
 		timeout = MAX_SCHEDULE_TIMEOUT;
+	else
+		timeout = msecs_to_jiffies(timeout);
 
 	trace_sync_wait(fence, 1);
 	for (i = 0; i < fence->num_fences; ++i)
@@ -383,6 +392,7 @@ static void android_fence_release(struct fence *fence)
 	if (parent->ops->free_pt)
 		parent->ops->free_pt(pt);
 
+	sync_timeline_put(parent);
 	kfree(pt);
 }
 
diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c
index 55ad34085f2f..2ef6496c7cd0 100644
--- a/drivers/staging/android/sync_debug.c
+++ b/drivers/staging/android/sync_debug.c
@@ -82,18 +82,18 @@ static const char *sync_status_str(int status)
 
 static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
 {
-	int status = 0;
+	int status = 1;
 	struct sync_timeline *parent = sync_pt_parent(pt);
-	if (fence_is_signaled(&pt->base)) {
+
+	if (__fence_is_signaled(&pt->base))
 		status = pt->base.status;
-		if (!status)
-			status = 1;
-	}
+
 	seq_printf(s, "  %s%spt %s",
 		   fence ? parent->name : "",
 		   fence ? "_" : "",
 		   sync_status_str(status));
-	if (status) {
+
+	if (status <= 0) {
 		struct timeval tv = ktime_to_timeval(pt->base.timestamp);
 		seq_printf(s, "@%ld.%06ld", tv.tv_sec, tv.tv_usec);
 	}
diff --git a/include/linux/fence.h b/include/linux/fence.h
index 2beb3b0ff2a3..dd1639ff96c7 100644
--- a/include/linux/fence.h
+++ b/include/linux/fence.h
@@ -237,6 +237,20 @@ int fence_add_callback(struct fence *fence, struct fence_cb *cb,
 bool fence_remove_callback(struct fence *fence, struct fence_cb *cb);
 void fence_enable_sw_signaling(struct fence *fence);
 
+static inline bool
+__fence_is_signaled(struct fence *fence)
+{
+	if (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+		return true;
+
+	if (fence->ops->signaled && fence->ops->signaled(fence)) {
+		__fence_signal(fence);
+		return true;
+	}
+
+	return false;
+}
+
 /**
  * fence_is_signaled - Return an indication if the fence is signaled yet.
  * @fence:	[in]	the fence to check

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-02 21:36           ` Colin Cross
  2013-11-03  6:31             ` Maarten Lankhorst
  2013-11-04  9:36             ` Maarten Lankhorst
@ 2013-11-04 10:31             ` Maarten Lankhorst
  2013-11-07 21:11               ` Rom Lemarchand
  2 siblings, 1 reply; 18+ messages in thread
From: Maarten Lankhorst @ 2013-11-04 10:31 UTC (permalink / raw)
  To: Colin Cross; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel

op 02-11-13 22:36, Colin Cross schreef:
> On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst
> <maarten.lankhorst@canonical.com> wrote:
>> op 24-10-13 14:13, Maarten Lankhorst schreef:
>>> So I actually tried to implement it now. I killed all the deprecated members and assumed a linear timeline.
>>> This means that syncpoints can only be added at the end, not in between. In particular it means sw_sync
>>> might be slightly broken.
>>>
>>> I only tested it with a simple program I wrote called ufence.c, it's in drivers/staging/android/ufence.c in the following tree:
>>>
>>> http://cgit.freedesktop.org/~mlankhorst/linux
>>>
>>> the "rfc: convert android to fence api" has all the changes from my dma-fence proposal to what android would need,
>>> it also converts the userspace fence api to use the dma-fence api.
>>>
>>> sync_pt is implemented as fence too. This meant not having to convert all of android right away, though I did make some changes.
>>> I killed the deprecated members and made all the fence calls forward to the sync_timeline_ops. dup and compare are no longer used.
>>>
>>> I haven't given this a spin on a full android kernel, only with the components that are in mainline kernel under staging and my dumb test program.
>>>
>>> ~Maarten
>>>
>>> PS: The nomenclature is very confusing. I want to rename dma-fence to syncpoint, but I want some feedback from the android devs first. :)
>>>
>> Come on, any feedback? I want to move the discussion forward.
>>
>> ~Maarten
> I experimented with it a little on a device that uses sync and came
> across a few bugs:
> 1.  sync_timeline_signal needs to call __fence_signal on all signaled
> points on the timeline, not just the first
> 2.  fence_add_callback doesn't always initialize cb.node
> 3.  sync_fence_wait should take ms
> 4.  sync_print_pt status printing was incorrect
> 5.  there is a deadlock:
>    sync_print_obj takes obj->child_list_lock
>    sync_print_pt
>    fence_is_signaled
>    fence_signal takes fence->lock == obj->child_list_lock
> 6.  freeing a timeline before all the fences holding points on that
> timeline have timed out results in a crash
>
> With the attached patch to fix these issues, our libsync and sync_test
> give the same results as with our sync code.  I haven't tested against
> the full Android framework yet.
>
> The compare op and timeline ordering is critical to the efficiency of
> sync points on Android.  The compare op is used when merging fences to
> drop all but the latest point on the same timeline.  This is necessary
> for example when the same buffer is submitted to the display on
> multiple frames, like when there is a live wallpaper in the background
> updating at 60 fps and a static screen of widgets on top of it.  The
> static widget buffer is submitted on every frame, returning a new
> fence each time.  The compositor merges the new fence with the fence
> for the previous buffer, and because they are on the same timeline it
> merges down to a single point.  I experimented with disabling the
> merge optimization on a Nexus 10, and found that leaving the screen on
> running a live wallpaper eventually resulted in 100k outstanding sync
> points.

Well, here I did the same for dma-fence, can you take a look?

---

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 2c7fd3f2ab23..d1d89f1f8553 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -232,39 +232,62 @@ void sync_fence_install(struct sync_fence *fence, int fd)
 }
 EXPORT_SYMBOL(sync_fence_install);
 
+static void sync_fence_add_pt(struct sync_fence *fence, int *i, struct fence *pt) {
+	fence->cbs[*i].sync_pt = pt;
+	fence->cbs[*i].fence = fence;
+
+	if (!fence_add_callback(pt, &fence->cbs[*i].cb, fence_check_cb_func)) {
+		fence_get(pt);
+		(*i)++;
+	}
+}
+
 struct sync_fence *sync_fence_merge(const char *name,
 				    struct sync_fence *a, struct sync_fence *b)
 {
 	int num_fences = a->num_fences + b->num_fences;
 	struct sync_fence *fence;
-	int i;
+	int i, i_a, i_b;
 
 	fence = sync_fence_alloc(offsetof(struct sync_fence, cbs[num_fences]), name);
 	if (fence == NULL)
 		return NULL;
 
-	fence->num_fences = num_fences;
 	atomic_set(&fence->status, num_fences);
 
-	for (i = 0; i < a->num_fences; ++i) {
-		struct fence *pt = a->cbs[i].sync_pt;
-
-		fence_get(pt);
-		fence->cbs[i].sync_pt = pt;
-		fence->cbs[i].fence = fence;
-		if (fence_add_callback(pt, &fence->cbs[i].cb, fence_check_cb_func))
-			atomic_dec(&fence->status);
+	/*
+	 * Assume sync_fence a and b are both ordered and have no
+	 * duplicates with the same context.
+	 *
+	 * If a sync_fence can only be created with sync_fence_merge
+	 * and sync_fence_create, this is a reasonable assumption.
+	 */
+	for (i = i_a = i_b = 0; i_a < a->num_fences || i_b < b->num_fences; ) {
+		struct fence *pt_a = i_a < a->num_fences ? a->cbs[i_a].sync_pt : NULL;
+		struct fence *pt_b = i_b < b->num_fences ? b->cbs[i_b].sync_pt : NULL;
+
+		if (!pt_b || pt_a->context < pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_a);
+
+			i_a++;
+		} else if (!pt_a || pt_a->context > pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_b);
+
+			i_b++;
+		} else {
+			if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+				sync_fence_add_pt(fence, &i, pt_a);
+			else
+				sync_fence_add_pt(fence, &i, pt_b);
+
+			i_a++;
+			i_b++;
+		}
 	}
 
-	for (i = 0; i < b->num_fences; ++i) {
-		struct fence *pt = b->cbs[i].sync_pt;
-
-		fence_get(pt);
-		fence->cbs[a->num_fences + i].sync_pt = pt;
-		fence->cbs[a->num_fences + i].fence = fence;
-		if (fence_add_callback(pt, &fence->cbs[a->num_fences + i].cb, fence_check_cb_func))
-			atomic_dec(&fence->status);
-	}
+	if (num_fences > i)
+		atomic_sub(num_fences - i, &fence->status);
+	fence->num_fences = i;
 
 	sync_fence_debug_add(fence);
 	return fence;

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-04 10:31             ` Maarten Lankhorst
@ 2013-11-07 21:11               ` Rom Lemarchand
  2013-11-08 10:43                 ` Maarten Lankhorst
  2013-11-08 11:43                 ` Maarten Lankhorst
  0 siblings, 2 replies; 18+ messages in thread
From: Rom Lemarchand @ 2013-11-07 21:11 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: linaro-mm-sig, Android Kernel Team, dri-devel, Colin Cross


[-- Attachment #1.1: Type: text/plain, Size: 7454 bytes --]

Hi Maarten, I tested your changes and needed the attached patch: behavior
now seems equivalent as android sync. I haven't tested performance.

The issue resolved by this patch happens when i_b < b->num_fences and i_a
>= a->num_fences (or vice versa). Then, pt_a is invalid and so
dereferencing pt_a->context causes a crash.


On Mon, Nov 4, 2013 at 2:31 AM, Maarten Lankhorst <
maarten.lankhorst@canonical.com> wrote:

> op 02-11-13 22:36, Colin Cross schreef:
> > On Wed, Oct 30, 2013 at 5:17 AM, Maarten Lankhorst
> > <maarten.lankhorst@canonical.com> wrote:
> >> op 24-10-13 14:13, Maarten Lankhorst schreef:
> >>> So I actually tried to implement it now. I killed all the deprecated
> members and assumed a linear timeline.
> >>> This means that syncpoints can only be added at the end, not in
> between. In particular it means sw_sync
> >>> might be slightly broken.
> >>>
> >>> I only tested it with a simple program I wrote called ufence.c, it's
> in drivers/staging/android/ufence.c in the following tree:
> >>>
> >>> http://cgit.freedesktop.org/~mlankhorst/linux
> >>>
> >>> the "rfc: convert android to fence api" has all the changes from my
> dma-fence proposal to what android would need,
> >>> it also converts the userspace fence api to use the dma-fence api.
> >>>
> >>> sync_pt is implemented as fence too. This meant not having to convert
> all of android right away, though I did make some changes.
> >>> I killed the deprecated members and made all the fence calls forward
> to the sync_timeline_ops. dup and compare are no longer used.
> >>>
> >>> I haven't given this a spin on a full android kernel, only with the
> components that are in mainline kernel under staging and my dumb test
> program.
> >>>
> >>> ~Maarten
> >>>
> >>> PS: The nomenclature is very confusing. I want to rename dma-fence to
> syncpoint, but I want some feedback from the android devs first. :)
> >>>
> >> Come on, any feedback? I want to move the discussion forward.
> >>
> >> ~Maarten
> > I experimented with it a little on a device that uses sync and came
> > across a few bugs:
> > 1.  sync_timeline_signal needs to call __fence_signal on all signaled
> > points on the timeline, not just the first
> > 2.  fence_add_callback doesn't always initialize cb.node
> > 3.  sync_fence_wait should take ms
> > 4.  sync_print_pt status printing was incorrect
> > 5.  there is a deadlock:
> >    sync_print_obj takes obj->child_list_lock
> >    sync_print_pt
> >    fence_is_signaled
> >    fence_signal takes fence->lock == obj->child_list_lock
> > 6.  freeing a timeline before all the fences holding points on that
> > timeline have timed out results in a crash
> >
> > With the attached patch to fix these issues, our libsync and sync_test
> > give the same results as with our sync code.  I haven't tested against
> > the full Android framework yet.
> >
> > The compare op and timeline ordering is critical to the efficiency of
> > sync points on Android.  The compare op is used when merging fences to
> > drop all but the latest point on the same timeline.  This is necessary
> > for example when the same buffer is submitted to the display on
> > multiple frames, like when there is a live wallpaper in the background
> > updating at 60 fps and a static screen of widgets on top of it.  The
> > static widget buffer is submitted on every frame, returning a new
> > fence each time.  The compositor merges the new fence with the fence
> > for the previous buffer, and because they are on the same timeline it
> > merges down to a single point.  I experimented with disabling the
> > merge optimization on a Nexus 10, and found that leaving the screen on
> > running a live wallpaper eventually resulted in 100k outstanding sync
> > points.
>
> Well, here I did the same for dma-fence, can you take a look?
>
> ---
>
> diff --git a/drivers/staging/android/sync.c
> b/drivers/staging/android/sync.c
> index 2c7fd3f2ab23..d1d89f1f8553 100644
> --- a/drivers/staging/android/sync.c
> +++ b/drivers/staging/android/sync.c
> @@ -232,39 +232,62 @@ void sync_fence_install(struct sync_fence *fence,
> int fd)
>  }
>  EXPORT_SYMBOL(sync_fence_install);
>
> +static void sync_fence_add_pt(struct sync_fence *fence, int *i, struct
> fence *pt) {
> +       fence->cbs[*i].sync_pt = pt;
> +       fence->cbs[*i].fence = fence;
> +
> +       if (!fence_add_callback(pt, &fence->cbs[*i].cb,
> fence_check_cb_func)) {
> +               fence_get(pt);
> +               (*i)++;
> +       }
> +}
> +
>  struct sync_fence *sync_fence_merge(const char *name,
>                                     struct sync_fence *a, struct
> sync_fence *b)
>  {
>         int num_fences = a->num_fences + b->num_fences;
>         struct sync_fence *fence;
> -       int i;
> +       int i, i_a, i_b;
>
>         fence = sync_fence_alloc(offsetof(struct sync_fence,
> cbs[num_fences]), name);
>         if (fence == NULL)
>                 return NULL;
>
> -       fence->num_fences = num_fences;
>         atomic_set(&fence->status, num_fences);
>
> -       for (i = 0; i < a->num_fences; ++i) {
> -               struct fence *pt = a->cbs[i].sync_pt;
> -
> -               fence_get(pt);
> -               fence->cbs[i].sync_pt = pt;
> -               fence->cbs[i].fence = fence;
> -               if (fence_add_callback(pt, &fence->cbs[i].cb,
> fence_check_cb_func))
> -                       atomic_dec(&fence->status);
> +       /*
> +        * Assume sync_fence a and b are both ordered and have no
> +        * duplicates with the same context.
> +        *
> +        * If a sync_fence can only be created with sync_fence_merge
> +        * and sync_fence_create, this is a reasonable assumption.
> +        */
> +       for (i = i_a = i_b = 0; i_a < a->num_fences || i_b <
> b->num_fences; ) {
> +               struct fence *pt_a = i_a < a->num_fences ?
> a->cbs[i_a].sync_pt : NULL;
> +               struct fence *pt_b = i_b < b->num_fences ?
> b->cbs[i_b].sync_pt : NULL;
> +
> +               if (!pt_b || pt_a->context < pt_b->context) {
> +                       sync_fence_add_pt(fence, &i, pt_a);
> +
> +                       i_a++;
> +               } else if (!pt_a || pt_a->context > pt_b->context) {
> +                       sync_fence_add_pt(fence, &i, pt_b);
> +
> +                       i_b++;
> +               } else {
> +                       if (pt_a->seqno - pt_b->seqno <= INT_MAX)
> +                               sync_fence_add_pt(fence, &i, pt_a);
> +                       else
> +                               sync_fence_add_pt(fence, &i, pt_b);
> +
> +                       i_a++;
> +                       i_b++;
> +               }
>         }
>
> -       for (i = 0; i < b->num_fences; ++i) {
> -               struct fence *pt = b->cbs[i].sync_pt;
> -
> -               fence_get(pt);
> -               fence->cbs[a->num_fences + i].sync_pt = pt;
> -               fence->cbs[a->num_fences + i].fence = fence;
> -               if (fence_add_callback(pt, &fence->cbs[a->num_fences +
> i].cb, fence_check_cb_func))
> -                       atomic_dec(&fence->status);
> -       }
> +       if (num_fences > i)
> +               atomic_sub(num_fences - i, &fence->status);
> +       fence->num_fences = i;
>
>         sync_fence_debug_add(fence);
>         return fence;
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel
>

[-- Attachment #1.2: Type: text/html, Size: 9210 bytes --]

[-- Attachment #2: 0001-fix.patch --]
[-- Type: text/x-patch, Size: 1201 bytes --]

From a440530a29682c595ad69b8cbb35c568228a8777 Mon Sep 17 00:00:00 2001
From: Rom Lemarchand <romlem@google.com>
Date: Thu, 7 Nov 2013 11:36:08 -0800
Subject: [PATCH] fix

Change-Id: Ie8a10dc466462835456d12962b378158a917a33e
---
 drivers/staging/android/sync.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 1025857..04af0fe 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -266,11 +266,19 @@ struct sync_fence *sync_fence_merge(const char *name,
 		struct fence *pt_a = a->cbs[i_a].sync_pt;
 		struct fence *pt_b = b->cbs[i_b].sync_pt;
 
-		if (i_b >= b->num_fences || pt_a->context < pt_b->context) {
+		if (i_b >= b->num_fences) {
 			sync_fence_add_pt(fence, &i, pt_a);
 
 			i_a++;
-		} else if (i_a >= a->num_fences || pt_a->context > pt_b->context) {
+		} else if (i_a >= a->num_fences) {
+			sync_fence_add_pt(fence, &i, pt_b);
+
+			i_b++;
+		} else if (pt_a->context < pt_b->context) {
+			sync_fence_add_pt(fence, &i, pt_a);
+
+			i_a++;
+		} else if (pt_a->context > pt_b->context) {
 			sync_fence_add_pt(fence, &i, pt_b);
 
 			i_b++;
-- 
1.8.4.1


[-- Attachment #3: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-07 21:11               ` Rom Lemarchand
@ 2013-11-08 10:43                 ` Maarten Lankhorst
  2013-11-08 11:43                 ` Maarten Lankhorst
  1 sibling, 0 replies; 18+ messages in thread
From: Maarten Lankhorst @ 2013-11-08 10:43 UTC (permalink / raw)
  To: Rom Lemarchand; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel, Colin Cross

op 07-11-13 22:11, Rom Lemarchand schreef:
> Hi Maarten, I tested your changes and needed the attached patch: behavior
> now seems equivalent as android sync. I haven't tested performance.
>
> The issue resolved by this patch happens when i_b < b->num_fences and
> i_a >= a->num_fences (or vice versa). Then, pt_a is invalid and so
> dereferencing pt_a->context causes a crash.

Oops, thinko. :) Originally I had it correct by doing this:

+       /*
+        * Assume sync_fence a and b are both ordered and have no
+        * duplicates with the same context.
+        *
+        * If a sync_fence can only be created with sync_fence_merge
+        * and sync_fence_create, this is a reasonable assumption.
+        */
+       for (i = i_a = i_b = 0; i_a < a->num_fences && i_b < b->num_fences; ) {
+               struct fence *pt_a = a->cbs[i_a].sync_pt;
+               struct fence *pt_b = b->cbs[i_b].sync_pt;
+
+               if (pt_a->context < pt_b->context) {
+                       sync_fence_add_pt(fence, &i, pt_a);
+
+                       i_a++;
+               } else if (pt_a->context > pt_b->context) {
+                       sync_fence_add_pt(fence, &i, pt_b);
+
+                       i_b++;
+               } else {
+                       if (pt_a->seqno - pt_b->seqno <= INT_MAX)
+                               sync_fence_add_pt(fence, &i, pt_a);
+                       else
+                               sync_fence_add_pt(fence, &i, pt_b);
+
+                       i_a++;
+                       i_b++;
+               }
+        }
+
+        /* Add remaining fences from a or b*/
+        for (; i_a < a->num_fences; i_a++)
+               sync_fence_add_pt(fence, &i, a->cbs[i_a].sync_pt);
+
+        for (; i_b < b->num_fences; i_b++)
+               sync_fence_add_pt(fence, &i, b->cbs[i_b].sync_pt);

Then I thought I could clean it up by merging it, but that ended up being
more unreadable and crashing... so I guess I'll revert back to this version. :)

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-07 21:11               ` Rom Lemarchand
  2013-11-08 10:43                 ` Maarten Lankhorst
@ 2013-11-08 11:43                 ` Maarten Lankhorst
  2013-11-08 14:35                   ` Rom Lemarchand
  2013-11-12  1:53                   ` Rom Lemarchand
  1 sibling, 2 replies; 18+ messages in thread
From: Maarten Lankhorst @ 2013-11-08 11:43 UTC (permalink / raw)
  To: Rom Lemarchand; +Cc: linaro-mm-sig, Android Kernel Team, dri-devel, Colin Cross

op 07-11-13 22:11, Rom Lemarchand schreef:
> Hi Maarten, I tested your changes and needed the attached patch: behavior
> now seems equivalent as android sync. I haven't tested performance.
>
> The issue resolved by this patch happens when i_b < b->num_fences and i_a
>> = a->num_fences (or vice versa). Then, pt_a is invalid and so
> dereferencing pt_a->context causes a crash.
>
Yeah, I pushed my original fix. I intended to keep android userspace behavior the same, and I tried to keep the kernelspace the api same as much as I could. If peformance is the same, or not noticeably worse, would there be any objections on the android side about renaming dma-fence to syncpoint, and getting it in mainline?

~Maarten

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-08 11:43                 ` Maarten Lankhorst
@ 2013-11-08 14:35                   ` Rom Lemarchand
  2013-11-12  1:53                   ` Rom Lemarchand
  1 sibling, 0 replies; 18+ messages in thread
From: Rom Lemarchand @ 2013-11-08 14:35 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: linaro-mm-sig, Android Kernel Team, Colin Cross, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 917 bytes --]

Let me run some benchmarks today, talk to people internally, and I'll let
you know.
On Nov 8, 2013 3:43 AM, "Maarten Lankhorst" <maarten.lankhorst@canonical.com>
wrote:

> op 07-11-13 22:11, Rom Lemarchand schreef:
> > Hi Maarten, I tested your changes and needed the attached patch: behavior
> > now seems equivalent as android sync. I haven't tested performance.
> >
> > The issue resolved by this patch happens when i_b < b->num_fences and i_a
> >> = a->num_fences (or vice versa). Then, pt_a is invalid and so
> > dereferencing pt_a->context causes a crash.
> >
> Yeah, I pushed my original fix. I intended to keep android userspace
> behavior the same, and I tried to keep the kernelspace the api same as much
> as I could. If peformance is the same, or not noticeably worse, would there
> be any objections on the android side about renaming dma-fence to
> syncpoint, and getting it in mainline?
>
> ~Maarten
>

[-- Attachment #1.2: Type: text/html, Size: 1245 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [Linaro-mm-sig] thoughts of looking at android fences
  2013-11-08 11:43                 ` Maarten Lankhorst
  2013-11-08 14:35                   ` Rom Lemarchand
@ 2013-11-12  1:53                   ` Rom Lemarchand
  1 sibling, 0 replies; 18+ messages in thread
From: Rom Lemarchand @ 2013-11-12  1:53 UTC (permalink / raw)
  To: Maarten Lankhorst
  Cc: linaro-mm-sig, Android Kernel Team, dri-devel, Colin Cross


[-- Attachment #1.1: Type: text/plain, Size: 1136 bytes --]

I ran some benchmarks and things seem to be running about the same.
No one on our graphics team seemed concerned about the change.

The only concern I heard was about the increased complexity of the new sync
code as opposed to the old sync framework which tried to keep things
straightforward.


On Fri, Nov 8, 2013 at 3:43 AM, Maarten Lankhorst <
maarten.lankhorst@canonical.com> wrote:

> op 07-11-13 22:11, Rom Lemarchand schreef:
> > Hi Maarten, I tested your changes and needed the attached patch: behavior
> > now seems equivalent as android sync. I haven't tested performance.
> >
> > The issue resolved by this patch happens when i_b < b->num_fences and i_a
> >> = a->num_fences (or vice versa). Then, pt_a is invalid and so
> > dereferencing pt_a->context causes a crash.
> >
> Yeah, I pushed my original fix. I intended to keep android userspace
> behavior the same, and I tried to keep the kernelspace the api same as much
> as I could. If peformance is the same, or not noticeably worse, would there
> be any objections on the android side about renaming dma-fence to
> syncpoint, and getting it in mainline?
>
> ~Maarten
>

[-- Attachment #1.2: Type: text/html, Size: 1653 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2013-11-12  1:53 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-02  7:35 thoughts of looking at android fences Maarten Lankhorst
2013-10-02 18:13 ` [Linaro-mm-sig] " Erik Gilling
2013-10-08 17:37   ` John Stultz
2013-10-08 18:56     ` Rob Clark
2013-10-09 14:39     ` Maarten Lankhorst
2013-10-24 12:13       ` Maarten Lankhorst
2013-10-30 12:17         ` Maarten Lankhorst
2013-11-01 21:03           ` Rom Lemarchand
2013-11-02 21:36           ` Colin Cross
2013-11-03  6:31             ` Maarten Lankhorst
2013-11-04  9:36             ` Maarten Lankhorst
2013-11-04 10:31             ` Maarten Lankhorst
2013-11-07 21:11               ` Rom Lemarchand
2013-11-08 10:43                 ` Maarten Lankhorst
2013-11-08 11:43                 ` Maarten Lankhorst
2013-11-08 14:35                   ` Rom Lemarchand
2013-11-12  1:53                   ` Rom Lemarchand
2013-10-08 18:47   ` Rob Clark

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.