All of lore.kernel.org
 help / color / mirror / Atom feed
* Shared semaphores for amdgpu
@ 2017-01-05  4:09 Andres Rodriguez
       [not found] ` <544E607D03B20249AA404517E498FC469A558B-Lp/cVzEoVyaisxZYEgh0i620KmCxYQEWVpNB7YpNyf8@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Andres Rodriguez @ 2017-01-05  4:09 UTC (permalink / raw)
  To: zhoucm1, Mao, David, Christian.Koenig-5C7GfCeVMHo
  Cc: Dave Airlie, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Pierre-Loup Griffais


[-- Attachment #1.1: Type: text/plain, Size: 311 bytes --]

Hey guys,

Just curious if there are any updates on the topic of shared semaphores for amdgpu discussed here:
https://lists.freedesktop.org/archives/amd-gfx/2016-December/003777.html

I wasn't subscribed to amd-gfx yet when the topic started, so replying to it directly is cumbersome.

Regards,
Andres

[-- Attachment #1.2: Type: text/html, Size: 778 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* RE: Shared semaphores for amdgpu
       [not found] ` <544E607D03B20249AA404517E498FC469A558B-Lp/cVzEoVyaisxZYEgh0i620KmCxYQEWVpNB7YpNyf8@public.gmane.org>
@ 2017-01-05  4:13   ` Mao, David
       [not found]     ` <BN4PR12MB0787AE3A185BE4D6916CE42AEE600-aH9FTdWx9BancvD3hK8fMAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Mao, David @ 2017-01-05  4:13 UTC (permalink / raw)
  To: Andres Rodriguez, Zhou, David(ChunMing), Koenig, Christian
  Cc: Dave Airlie, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Pierre-Loup Griffais


[-- Attachment #1.1: Type: text/plain, Size: 1062 bytes --]

Hi Andres,
We have a local change made yesterday which eliminate the need to get unused fd in the creation time.
If everything goes well, I expect the change could be sent out for review next week.

Best Regards,
David

From: Andres Rodriguez [mailto:andresr-38hxoXRICFZx67MzidHQgQC/G2K4zDHf@public.gmane.org]
Sent: Thursday, January 5, 2017 12:10 PM
To: Zhou, David(ChunMing) <David1.Zhou-5C7GfCeVMHo@public.gmane.org>; Mao, David <David.Mao@amd.com>; Koenig, Christian <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
Cc: Pierre-Loup Griffais <pgriffais-38hxoXRICFZx67MzidHQgQC/G2K4zDHf@public.gmane.org>; Dave Airlie <airlied@redhat.com>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
Subject: Shared semaphores for amdgpu

Hey guys,

Just curious if there are any updates on the topic of shared semaphores for amdgpu discussed here:
https://lists.freedesktop.org/archives/amd-gfx/2016-December/003777.html

I wasn't subscribed to amd-gfx yet when the topic started, so replying to it directly is cumbersome.

Regards,
Andres

[-- Attachment #1.2: Type: text/html, Size: 4849 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]     ` <BN4PR12MB0787AE3A185BE4D6916CE42AEE600-aH9FTdWx9BancvD3hK8fMAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
@ 2017-01-05 17:48       ` Andres Rodriguez
       [not found]         ` <a25fdfeb-be5d-2d23-d7b1-ef14891ba6d5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Andres Rodriguez @ 2017-01-05 17:48 UTC (permalink / raw)
  To: Mao, David, Andres Rodriguez, Zhou, David(ChunMing), Koenig, Christian
  Cc: Dave Airlie, Pierre-Loup Griffais,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW


[-- Attachment #1.1: Type: text/plain, Size: 1458 bytes --]

Cool, thanks for the heads up David.


Regards,

Andres


On 1/4/2017 11:13 PM, Mao, David wrote:
>
> Hi Andres,
>
> We have a local change made yesterday which eliminate the need to get 
> unused fd in the creation time.
>
> If everything goes well, I expect the change could be sent out for 
> review next week.
>
> Best Regards,
>
> David
>
> *From:*Andres Rodriguez [mailto:andresr-38hxoXRICFZx67MzidHQgQC/G2K4zDHf@public.gmane.org]
> *Sent:* Thursday, January 5, 2017 12:10 PM
> *To:* Zhou, David(ChunMing) <David1.Zhou-5C7GfCeVMHo@public.gmane.org>; Mao, David 
> <David.Mao-5C7GfCeVMHo@public.gmane.org>; Koenig, Christian <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
> *Cc:* Pierre-Loup Griffais <pgriffais-38hxoXRICFZx67MzidHQgQC/G2K4zDHf@public.gmane.org>; Dave Airlie 
> <airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> *Subject:* Shared semaphores for amdgpu
>
> Hey guys,
>
> Just curious if there are any updates on the topic of shared 
> semaphores for amdgpu discussed here:
> https://lists.freedesktop.org/archives/amd-gfx/2016-December/003777.html
>
> I wasn't subscribed to amd-gfx yet when the topic started, so replying 
> to it directly is cumbersome.
>
> Regards,
> Andres
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[-- Attachment #1.2: Type: text/html, Size: 6838 bytes --]

[-- Attachment #2: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]         ` <a25fdfeb-be5d-2d23-d7b1-ef14891ba6d5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2017-02-27 19:36           ` Dave Airlie
  2017-02-28  1:46             ` zhoucm1
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-02-27 19:36 UTC (permalink / raw)
  To: Andres Rodriguez
  Cc: Zhou, David(ChunMing),
	Mao, David, Andres Rodriguez,
	amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Dave Airlie, Koenig,
	Christian, Pierre-Loup Griffais

Hi,

Any further news on these?

Dave.

On 6 January 2017 at 03:48, Andres Rodriguez <andresx7@gmail.com> wrote:
> Cool, thanks for the heads up David.
>
>
> Regards,
>
> Andres
>
>
> On 1/4/2017 11:13 PM, Mao, David wrote:
>
> Hi Andres,
>
> We have a local change made yesterday which eliminate the need to get unused
> fd in the creation time.
>
> If everything goes well, I expect the change could be sent out for review
> next week.
>
>
>
> Best Regards,
>
> David
>
>
>
> From: Andres Rodriguez [mailto:andresr@valvesoftware.com]
> Sent: Thursday, January 5, 2017 12:10 PM
> To: Zhou, David(ChunMing) <David1.Zhou@amd.com>; Mao, David
> <David.Mao@amd.com>; Koenig, Christian <Christian.Koenig@amd.com>
> Cc: Pierre-Loup Griffais <pgriffais@valvesoftware.com>; Dave Airlie
> <airlied@redhat.com>; amd-gfx@lists.freedesktop.org
> Subject: Shared semaphores for amdgpu
>
>
>
> Hey guys,
>
> Just curious if there are any updates on the topic of shared semaphores for
> amdgpu discussed here:
> https://lists.freedesktop.org/archives/amd-gfx/2016-December/003777.html
>
> I wasn't subscribed to amd-gfx yet when the topic started, so replying to it
> directly is cumbersome.
>
> Regards,
> Andres
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
  2017-02-27 19:36           ` Dave Airlie
@ 2017-02-28  1:46             ` zhoucm1
       [not found]               ` <58B4D68E.5080606-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: zhoucm1 @ 2017-02-28  1:46 UTC (permalink / raw)
  To: Dave Airlie, Andres Rodriguez
  Cc: Mao, David, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Andres Rodriguez, Dave Airlie, Cui, Flora, Koenig, Christian,
	Pierre-Loup Griffais

[-- Attachment #1: Type: text/plain, Size: 2077 bytes --]

Hi Dave,

The attached is our semaphore implementation, amdgpu_cs.c is drm file, 
the others are kernel file.
Any suggestion?

Regards,
David Zhou

On 2017年02月28日 03:36, Dave Airlie wrote:
> Hi,
>
> Any further news on these?
>
> Dave.
>
> On 6 January 2017 at 03:48, Andres Rodriguez <andresx7-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> Cool, thanks for the heads up David.
>>
>>
>> Regards,
>>
>> Andres
>>
>>
>> On 1/4/2017 11:13 PM, Mao, David wrote:
>>
>> Hi Andres,
>>
>> We have a local change made yesterday which eliminate the need to get unused
>> fd in the creation time.
>>
>> If everything goes well, I expect the change could be sent out for review
>> next week.
>>
>>
>>
>> Best Regards,
>>
>> David
>>
>>
>>
>> From: Andres Rodriguez [mailto:andresr-38hxoXRICFZx67MzidHQgQC/G2K4zDHf@public.gmane.org]
>> Sent: Thursday, January 5, 2017 12:10 PM
>> To: Zhou, David(ChunMing) <David1.Zhou-5C7GfCeVMHo@public.gmane.org>; Mao, David
>> <David.Mao-5C7GfCeVMHo@public.gmane.org>; Koenig, Christian <Christian.Koenig-5C7GfCeVMHo@public.gmane.org>
>> Cc: Pierre-Loup Griffais <pgriffais-38hxoXRICFZx67MzidHQgQC/G2K4zDHf@public.gmane.org>; Dave Airlie
>> <airlied-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> Subject: Shared semaphores for amdgpu
>>
>>
>>
>> Hey guys,
>>
>> Just curious if there are any updates on the topic of shared semaphores for
>> amdgpu discussed here:
>> https://lists.freedesktop.org/archives/amd-gfx/2016-December/003777.html
>>
>> I wasn't subscribed to amd-gfx yet when the topic started, so replying to it
>> directly is cumbersome.
>>
>> Regards,
>> Andres
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>
>>
>>
>> _______________________________________________
>> amd-gfx mailing list
>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org
>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
>>


[-- Attachment #2: amdgpu_sem.c --]
[-- Type: text/x-csrc, Size: 10014 bytes --]

/*
 * Copyright 2016 Advanced Micro Devices, Inc.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
 * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 *
 * Authors:
 *    Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
 */
#include <linux/file.h>
#include <linux/fs.h>
#include <linux/kernel.h>
#include <linux/poll.h>
#include <linux/seq_file.h>
#include <linux/export.h>
#include <linux/sched.h>
#include <linux/slab.h>
#include <linux/uaccess.h>
#include <linux/anon_inodes.h>
#include "amdgpu_sem.h"
#include "amdgpu.h"
#include <drm/drmP.h>

static int amdgpu_sem_cring_add(struct amdgpu_fpriv *fpriv,
				struct drm_amdgpu_sem_in *in,
				struct amdgpu_sem *sem);

static void amdgpu_sem_core_free(struct kref *kref)
{
	struct amdgpu_sem_core *core = container_of(
		kref, struct amdgpu_sem_core, kref);

	if (core->file)
		fput(core->file);

	fence_put(core->fence);
	mutex_destroy(&core->lock);
	kfree(core);
}

static void amdgpu_sem_free(struct kref *kref)
{
	struct amdgpu_sem *sem = container_of(
		kref, struct amdgpu_sem, kref);

	list_del(&sem->list);
	kref_put(&sem->base->kref, amdgpu_sem_core_free);
	kfree(sem);
}

static inline void amdgpu_sem_get(struct amdgpu_sem *sem)
{
	if (sem)
		kref_get(&sem->kref);
}

static inline void amdgpu_sem_put(struct amdgpu_sem *sem)
{
	if (sem)
		kref_put(&sem->kref, amdgpu_sem_free);
}

static int amdgpu_sem_release(struct inode *inode, struct file *file)
{
	struct amdgpu_sem_core *core = file->private_data;

	kref_put(&core->kref, amdgpu_sem_core_free);
	return 0;
}

static unsigned int amdgpu_sem_poll(struct file *file, poll_table *wait)
{
	return 0;
}

static long amdgpu_sem_file_ioctl(struct file *file, unsigned int cmd,
				   unsigned long arg)
{
	return 0;
}

static const struct file_operations amdgpu_sem_fops = {
	.release = amdgpu_sem_release,
	.poll = amdgpu_sem_poll,
	.unlocked_ioctl = amdgpu_sem_file_ioctl,
	.compat_ioctl = amdgpu_sem_file_ioctl,
};


static inline struct amdgpu_sem *amdgpu_sem_lookup(struct amdgpu_fpriv *fpriv, u32 handle)
{
	struct amdgpu_sem *sem;

	spin_lock(&fpriv->sem_handles_lock);

	/* Check if we currently have a reference on the object */
	sem = idr_find(&fpriv->sem_handles, handle);
	amdgpu_sem_get(sem);

	spin_unlock(&fpriv->sem_handles_lock);

	return sem;
}

static struct amdgpu_sem_core *amdgpu_sem_core_alloc(void)
{
	struct amdgpu_sem_core *core;

	core = kzalloc(sizeof(*core), GFP_KERNEL);
	if (!core)
		return NULL;

	kref_init(&core->kref);
	mutex_init(&core->lock);
	return core;
}

static struct amdgpu_sem *amdgpu_sem_alloc(void)
{
	struct amdgpu_sem *sem;

	sem = kzalloc(sizeof(*sem), GFP_KERNEL);
	if (!sem)
		return NULL;

	kref_init(&sem->kref);
	INIT_LIST_HEAD(&sem->list);

	return sem;
}

static int amdgpu_sem_create(struct amdgpu_fpriv *fpriv, u32 *handle)
{
	struct amdgpu_sem *sem;
	struct amdgpu_sem_core *core;
	int ret;

	sem = amdgpu_sem_alloc();
	core = amdgpu_sem_core_alloc();
	if (!sem || !core) {
		kfree(sem);
		kfree(core);
		return -ENOMEM;
	}

	sem->base = core;

	idr_preload(GFP_KERNEL);
	spin_lock(&fpriv->sem_handles_lock);

	ret = idr_alloc(&fpriv->sem_handles, sem, 1, 0, GFP_NOWAIT);

	spin_unlock(&fpriv->sem_handles_lock);
	idr_preload_end();

	if (ret < 0)
		return ret;

	*handle = ret;
	return 0;
}

static int amdgpu_sem_signal(struct amdgpu_fpriv *fpriv,
				u32 handle, struct fence *fence)
{
	struct amdgpu_sem *sem;
	struct amdgpu_sem_core *core;

	sem = amdgpu_sem_lookup(fpriv, handle);
	if (!sem)
		return -EINVAL;

	core = sem->base;
	mutex_lock(&core->lock);
	fence_put(core->fence);
	core->fence = fence_get(fence);
	mutex_unlock(&core->lock);

	amdgpu_sem_put(sem);
	return 0;
}

static int amdgpu_sem_wait(struct amdgpu_fpriv *fpriv,
			  struct drm_amdgpu_sem_in *in)
{
	struct amdgpu_sem *sem;
	int ret;

	sem = amdgpu_sem_lookup(fpriv, in->handle);
	if (!sem)
		return -EINVAL;

	ret = amdgpu_sem_cring_add(fpriv, in, sem);
	amdgpu_sem_put(sem);

	return ret;
}

static int amdgpu_sem_import(struct amdgpu_fpriv *fpriv,
				       int fd, u32 *handle)
{
	struct file *file = fget(fd);
	struct amdgpu_sem *sem;
	struct amdgpu_sem_core *core;
	int ret;

	if (!file)
		return -EINVAL;

	core = file->private_data;
	if (!core) {
		fput(file);
		return -EINVAL;
	}

	mutex_lock(&core->lock);
	kref_get(&core->kref);
	mutex_unlock(&core->lock);
	sem = amdgpu_sem_alloc();
	if (!sem) {
		ret = -ENOMEM;
		goto err_sem;
	}

	sem->base = core;

	idr_preload(GFP_KERNEL);
	spin_lock(&fpriv->sem_handles_lock);

	ret = idr_alloc(&fpriv->sem_handles, sem, 1, 0, GFP_NOWAIT);

	spin_unlock(&fpriv->sem_handles_lock);
	idr_preload_end();

	if (ret < 0)
		goto err_out;

	*handle = ret;
	fput(file);
	return 0;
err_sem:
	kref_put(&core->kref, amdgpu_sem_core_free);
err_out:
	amdgpu_sem_put(sem);
	fput(file);
	return ret;

}

static int amdgpu_sem_export(struct amdgpu_fpriv *fpriv,
				       u32 handle, int *fd)
{
	struct amdgpu_sem *sem;
	struct amdgpu_sem_core *core;
	int ret;

	sem = amdgpu_sem_lookup(fpriv, handle);
	if (!sem)
		return -EINVAL;

	core = sem->base;
	mutex_lock(&core->lock);
	if (!core->file) {
		core->file = anon_inode_getfile("sem_file",
					       &amdgpu_sem_fops,
					       core, 0);
		if (IS_ERR(core->file)) {
			mutex_unlock(&core->lock);
			ret = -ENOMEM;
			goto err_put_sem;
		}
	}
	kref_get(&core->kref);
	mutex_unlock(&core->lock);

	ret = get_unused_fd_flags(O_CLOEXEC);
	if (ret < 0)
		goto err_put_file;

	fd_install(ret, core->file);

	*fd = ret;
	amdgpu_sem_put(sem);
	return 0;

err_put_file:
	kref_put(&core->kref, amdgpu_sem_core_free);
	fput(core->file);
err_put_sem:
	amdgpu_sem_put(sem);
	return ret;
}

void amdgpu_sem_destroy(struct amdgpu_fpriv *fpriv, u32 handle)
{
	struct amdgpu_sem *sem = amdgpu_sem_lookup(fpriv, handle);
	if (!sem)
		return;

	spin_lock(&fpriv->sem_handles_lock);
	idr_remove(&fpriv->sem_handles, handle);
	spin_unlock(&fpriv->sem_handles_lock);

	kref_sub(&sem->kref, 2, amdgpu_sem_free);
}

static struct fence *amdgpu_sem_get_fence(struct amdgpu_fpriv *fpriv,
					 struct drm_amdgpu_sem_in *in)
{
	struct amdgpu_ring *out_ring;
	struct amdgpu_ctx *ctx;
	struct fence *fence;
	uint32_t ctx_id, ip_type, ip_instance, ring;
	int r;

	ctx_id = in->ctx_id;
	ip_type = in->ip_type;
	ip_instance = in->ip_instance;
	ring = in->ring;
	ctx = amdgpu_ctx_get(fpriv, ctx_id);
	if (!ctx)
		return NULL;
	r = amdgpu_cs_get_ring(ctx->adev, ip_type, ip_instance, ring,
			       &out_ring);
	if (r) {
		amdgpu_ctx_put(ctx);
		return NULL;
	}
	/* get the last fence of this entity */
	fence = amdgpu_ctx_get_fence(ctx, out_ring,
				     in->seq ? in->seq :
				     ctx->rings[out_ring->idx].sequence - 1);
	amdgpu_ctx_put(ctx);

	return fence;
}

static int amdgpu_sem_cring_add(struct amdgpu_fpriv *fpriv,
				struct drm_amdgpu_sem_in *in,
				struct amdgpu_sem *sem)
{
	struct amdgpu_ring *out_ring;
	struct amdgpu_ctx *ctx;
	uint32_t ctx_id, ip_type, ip_instance, ring;
	int r;

	ctx_id = in->ctx_id;
	ip_type = in->ip_type;
	ip_instance = in->ip_instance;
	ring = in->ring;
	ctx = amdgpu_ctx_get(fpriv, ctx_id);
	if (!ctx)
		return -EINVAL;
	r = amdgpu_cs_get_ring(ctx->adev, ip_type, ip_instance, ring,
			       &out_ring);
	if (r)
		goto err;
	mutex_lock(&ctx->rings[out_ring->idx].sem_lock);
	list_add(&sem->list, &ctx->rings[out_ring->idx].sem_list);
	mutex_unlock(&ctx->rings[out_ring->idx].sem_lock);

err:
	amdgpu_ctx_put(ctx);
	return r;
}

int amdgpu_sem_add_cs(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
		     struct amdgpu_sync *sync)
{
	struct amdgpu_sem *sem, *tmp;
	int r = 0;

	if (list_empty(&ctx->rings[ring->idx].sem_list))
		return 0;

	mutex_lock(&ctx->rings[ring->idx].sem_lock);
	list_for_each_entry_safe(sem, tmp, &ctx->rings[ring->idx].sem_list,
				 list) {
		r = amdgpu_sync_fence(ctx->adev, sync, sem->base->fence);
		if (r)
			goto err;
		mutex_lock(&sem->base->lock);
		fence_put(sem->base->fence);
		sem->base->fence = NULL;
		mutex_unlock(&sem->base->lock);
		list_del_init(&sem->list);
	}
err:
	mutex_unlock(&ctx->rings[ring->idx].sem_lock);
	return r;
}

int amdgpu_sem_ioctl(struct drm_device *dev, void *data,
		     struct drm_file *filp)
{
	union drm_amdgpu_sem *args = data;
	struct amdgpu_fpriv *fpriv = filp->driver_priv;
	struct fence *fence;
	int r = 0;

	switch (args->in.op) {
	case AMDGPU_SEM_OP_CREATE_SEM:
		r = amdgpu_sem_create(fpriv, &args->out.handle);
		break;
	case AMDGPU_SEM_OP_WAIT_SEM:
		r = amdgpu_sem_wait(fpriv, &args->in);
		break;
	case AMDGPU_SEM_OP_SIGNAL_SEM:
		fence = amdgpu_sem_get_fence(fpriv, &args->in);
		if (IS_ERR(fence)) {
			r = PTR_ERR(fence);
			return r;
		}
		r = amdgpu_sem_signal(fpriv, args->in.handle, fence);
		fence_put(fence);
		break;
	case AMDGPU_SEM_OP_IMPORT_SEM:
		r = amdgpu_sem_import(fpriv, args->in.handle, &args->out.handle);
		break;
	case AMDGPU_SEM_OP_EXPORT_SEM:
		r = amdgpu_sem_export(fpriv, args->in.handle, &args->out.fd);
		break;
	case AMDGPU_SEM_OP_DESTROY_SEM:
		amdgpu_sem_destroy(fpriv, args->in.handle);
		break;
	default:
		r = -EINVAL;
		break;
	}

	return r;
}

[-- Attachment #3: amdgpu_sem.h --]
[-- Type: text/x-chdr, Size: 1682 bytes --]

/*
 * Copyright 2016 Advanced Micro Devices, Inc.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
 * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 *
 * Authors: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
 *
 */


#ifndef _LINUX_AMDGPU_SEM_H
#define _LINUX_AMDGPU_SEM_H

#include <linux/types.h>
#include <linux/kref.h>
#include <linux/ktime.h>
#include <linux/list.h>
#include <linux/spinlock.h>
#include <linux/fence.h>

struct amdgpu_sem_core {
	struct file		*file;
	struct kref		kref;
	struct fence            *fence;
	struct mutex	lock;
};

struct amdgpu_sem {
	struct amdgpu_sem_core	*base;
	struct kref		kref;
	struct list_head        list;
};

#endif /* _LINUX_AMDGPU_SEM_H */

[-- Attachment #4: amdgpu_cs.c --]
[-- Type: text/x-csrc, Size: 19282 bytes --]

/*
 * Copyright 2014 Advanced Micro Devices, Inc.
 *
 * Permission is hereby granted, free of charge, to any person obtaining a
 * copy of this software and associated documentation files (the "Software"),
 * to deal in the Software without restriction, including without limitation
 * the rights to use, copy, modify, merge, publish, distribute, sublicense,
 * and/or sell copies of the Software, and to permit persons to whom the
 * Software is furnished to do so, subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
 * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
 * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
 * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
 * OTHER DEALINGS IN THE SOFTWARE.
 *
 */

#ifdef HAVE_CONFIG_H
#include "config.h"
#endif

#include <sys/stat.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>
#include <sched.h>
#include <sys/ioctl.h>
#ifdef HAVE_ALLOCA_H
# include <alloca.h>
#endif

#include "xf86drm.h"
#include "amdgpu_drm.h"
#include "amdgpu_internal.h"

static int amdgpu_cs_unreference_sem(amdgpu_semaphore_handle sem);
static int amdgpu_cs_reset_sem(amdgpu_semaphore_handle sem);

/**
 * Create command submission context
 *
 * \param   dev - \c [in] amdgpu device handle
 * \param   context - \c [out] amdgpu context handle
 *
 * \return  0 on success otherwise POSIX Error code
*/
int amdgpu_cs_ctx_create(amdgpu_device_handle dev,
			 amdgpu_context_handle *context)
{
	struct amdgpu_context *gpu_context;
	union drm_amdgpu_ctx args;
	int i, j, k;
	int r;

	if (NULL == dev)
		return -EINVAL;
	if (NULL == context)
		return -EINVAL;

	gpu_context = calloc(1, sizeof(struct amdgpu_context));
	if (NULL == gpu_context)
		return -ENOMEM;

	gpu_context->dev = dev;

	r = pthread_mutex_init(&gpu_context->sequence_mutex, NULL);
	if (r)
		goto error;

	/* Create the context */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_CTX_OP_ALLOC_CTX;
	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_CTX, &args, sizeof(args));
	if (r)
		goto error;

	gpu_context->id = args.out.alloc.ctx_id;
	for (i = 0; i < AMDGPU_HW_IP_NUM; i++)
		for (j = 0; j < AMDGPU_HW_IP_INSTANCE_MAX_COUNT; j++)
			for (k = 0; k < AMDGPU_CS_MAX_RINGS; k++)
				list_inithead(&gpu_context->sem_list[i][j][k]);
	*context = (amdgpu_context_handle)gpu_context;

	return 0;

error:
	pthread_mutex_destroy(&gpu_context->sequence_mutex);
	free(gpu_context);
	return r;
}

/**
 * Release command submission context
 *
 * \param   dev - \c [in] amdgpu device handle
 * \param   context - \c [in] amdgpu context handle
 *
 * \return  0 on success otherwise POSIX Error code
*/
int amdgpu_cs_ctx_free(amdgpu_context_handle context)
{
	union drm_amdgpu_ctx args;
	int i, j, k;
	int r;

	if (NULL == context)
		return -EINVAL;

	pthread_mutex_destroy(&context->sequence_mutex);

	/* now deal with kernel side */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_CTX_OP_FREE_CTX;
	args.in.ctx_id = context->id;
	r = drmCommandWriteRead(context->dev->fd, DRM_AMDGPU_CTX,
				&args, sizeof(args));
	for (i = 0; i < AMDGPU_HW_IP_NUM; i++) {
		for (j = 0; j < AMDGPU_HW_IP_INSTANCE_MAX_COUNT; j++) {
			for (k = 0; k < AMDGPU_CS_MAX_RINGS; k++) {
				amdgpu_semaphore_handle sem;
				LIST_FOR_EACH_ENTRY(sem, &context->sem_list[i][j][k], list) {
					list_del(&sem->list);
					amdgpu_cs_reset_sem(sem);
					amdgpu_cs_unreference_sem(sem);
				}
			}
		}
	}
	free(context);

	return r;
}

int amdgpu_cs_query_reset_state(amdgpu_context_handle context,
				uint32_t *state, uint32_t *hangs)
{
	union drm_amdgpu_ctx args;
	int r;

	if (!context)
		return -EINVAL;

	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_CTX_OP_QUERY_STATE;
	args.in.ctx_id = context->id;
	r = drmCommandWriteRead(context->dev->fd, DRM_AMDGPU_CTX,
				&args, sizeof(args));
	if (!r) {
		*state = args.out.state.reset_status;
		*hangs = args.out.state.hangs;
	}
	return r;
}

/**
 * Submit command to kernel DRM
 * \param   dev - \c [in]  Device handle
 * \param   context - \c [in]  GPU Context
 * \param   ibs_request - \c [in]  Pointer to submission requests
 * \param   fence - \c [out] return fence for this submission
 *
 * \return  0 on success otherwise POSIX Error code
 * \sa amdgpu_cs_submit()
*/
static int amdgpu_cs_submit_one(amdgpu_context_handle context,
				struct amdgpu_cs_request *ibs_request)
{
	union drm_amdgpu_cs cs;
	uint64_t *chunk_array;
	struct drm_amdgpu_cs_chunk *chunks;
	struct drm_amdgpu_cs_chunk_data *chunk_data;
	struct drm_amdgpu_cs_chunk_dep *dependencies = NULL;
	struct drm_amdgpu_cs_chunk_dep *sem_dependencies = NULL;
	struct list_head *sem_list;
	amdgpu_semaphore_handle sem, tmp;
	uint32_t i, size, sem_count = 0;
	bool user_fence;
	int r = 0;

	if (ibs_request->ip_type >= AMDGPU_HW_IP_NUM)
		return -EINVAL;
	if (ibs_request->ring >= AMDGPU_CS_MAX_RINGS)
		return -EINVAL;
	if (ibs_request->number_of_ibs > AMDGPU_CS_MAX_IBS_PER_SUBMIT)
		return -EINVAL;
	if (ibs_request->number_of_ibs == 0) {
		ibs_request->seq_no = AMDGPU_NULL_SUBMIT_SEQ;
		return 0;
	}
	user_fence = (ibs_request->fence_info.handle != NULL);

	size = ibs_request->number_of_ibs + (user_fence ? 2 : 1) + 1;

	chunk_array = alloca(sizeof(uint64_t) * size);
	chunks = alloca(sizeof(struct drm_amdgpu_cs_chunk) * size);

	size = ibs_request->number_of_ibs + (user_fence ? 1 : 0);

	chunk_data = alloca(sizeof(struct drm_amdgpu_cs_chunk_data) * size);

	memset(&cs, 0, sizeof(cs));
	cs.in.chunks = (uint64_t)(uintptr_t)chunk_array;
	cs.in.ctx_id = context->id;
	if (ibs_request->resources)
		cs.in.bo_list_handle = ibs_request->resources->handle;
	cs.in.num_chunks = ibs_request->number_of_ibs;
	/* IB chunks */
	for (i = 0; i < ibs_request->number_of_ibs; i++) {
		struct amdgpu_cs_ib_info *ib;
		chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
		chunks[i].chunk_id = AMDGPU_CHUNK_ID_IB;
		chunks[i].length_dw = sizeof(struct drm_amdgpu_cs_chunk_ib) / 4;
		chunks[i].chunk_data = (uint64_t)(uintptr_t)&chunk_data[i];

		ib = &ibs_request->ibs[i];

		chunk_data[i].ib_data._pad = 0;
		chunk_data[i].ib_data.va_start = ib->ib_mc_address;
		chunk_data[i].ib_data.ib_bytes = ib->size * 4;
		chunk_data[i].ib_data.ip_type = ibs_request->ip_type;
		chunk_data[i].ib_data.ip_instance = ibs_request->ip_instance;
		chunk_data[i].ib_data.ring = ibs_request->ring;
		chunk_data[i].ib_data.flags = ib->flags;
	}

	pthread_mutex_lock(&context->sequence_mutex);

	if (user_fence) {
		i = cs.in.num_chunks++;

		/* fence chunk */
		chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
		chunks[i].chunk_id = AMDGPU_CHUNK_ID_FENCE;
		chunks[i].length_dw = sizeof(struct drm_amdgpu_cs_chunk_fence) / 4;
		chunks[i].chunk_data = (uint64_t)(uintptr_t)&chunk_data[i];

		/* fence bo handle */
		chunk_data[i].fence_data.handle = ibs_request->fence_info.handle->handle;
		/* offset */
		chunk_data[i].fence_data.offset = 
			ibs_request->fence_info.offset * sizeof(uint64_t);
	}

	if (ibs_request->number_of_dependencies) {
		dependencies = malloc(sizeof(struct drm_amdgpu_cs_chunk_dep) *
			ibs_request->number_of_dependencies);
		if (!dependencies) {
			r = -ENOMEM;
			goto error_unlock;
		}

		for (i = 0; i < ibs_request->number_of_dependencies; ++i) {
			struct amdgpu_cs_fence *info = &ibs_request->dependencies[i];
			struct drm_amdgpu_cs_chunk_dep *dep = &dependencies[i];
			dep->ip_type = info->ip_type;
			dep->ip_instance = info->ip_instance;
			dep->ring = info->ring;
			dep->ctx_id = info->context->id;
			dep->handle = info->fence;
		}

		i = cs.in.num_chunks++;

		/* dependencies chunk */
		chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
		chunks[i].chunk_id = AMDGPU_CHUNK_ID_DEPENDENCIES;
		chunks[i].length_dw = sizeof(struct drm_amdgpu_cs_chunk_dep) / 4
			* ibs_request->number_of_dependencies;
		chunks[i].chunk_data = (uint64_t)(uintptr_t)dependencies;
	}

	sem_list = &context->sem_list[ibs_request->ip_type][ibs_request->ip_instance][ibs_request->ring];
	LIST_FOR_EACH_ENTRY(sem, sem_list, list)
		sem_count++;
	if (sem_count) {
		sem_dependencies = malloc(sizeof(struct drm_amdgpu_cs_chunk_dep) * sem_count);
		if (!sem_dependencies) {
			r = -ENOMEM;
			goto error_unlock;
		}
		sem_count = 0;
		LIST_FOR_EACH_ENTRY_SAFE(sem, tmp, sem_list, list) {
			struct amdgpu_cs_fence *info = &sem->signal_fence;
			struct drm_amdgpu_cs_chunk_dep *dep = &sem_dependencies[sem_count++];
			dep->ip_type = info->ip_type;
			dep->ip_instance = info->ip_instance;
			dep->ring = info->ring;
			dep->ctx_id = info->context->id;
			dep->handle = info->fence;

			list_del(&sem->list);
			amdgpu_cs_reset_sem(sem);
			amdgpu_cs_unreference_sem(sem);
		}
		i = cs.in.num_chunks++;

		/* dependencies chunk */
		chunk_array[i] = (uint64_t)(uintptr_t)&chunks[i];
		chunks[i].chunk_id = AMDGPU_CHUNK_ID_DEPENDENCIES;
		chunks[i].length_dw = sizeof(struct drm_amdgpu_cs_chunk_dep) / 4 * sem_count;
		chunks[i].chunk_data = (uint64_t)(uintptr_t)sem_dependencies;
	}

	r = drmCommandWriteRead(context->dev->fd, DRM_AMDGPU_CS,
				&cs, sizeof(cs));
	if (r)
		goto error_unlock;

	ibs_request->seq_no = cs.out.handle;
	context->last_seq[ibs_request->ip_type][ibs_request->ip_instance][ibs_request->ring] = ibs_request->seq_no;
error_unlock:
	pthread_mutex_unlock(&context->sequence_mutex);
	free(dependencies);
	free(sem_dependencies);
	return r;
}

int amdgpu_cs_submit(amdgpu_context_handle context,
		     uint64_t flags,
		     struct amdgpu_cs_request *ibs_request,
		     uint32_t number_of_requests)
{
	uint32_t i;
	int r;

	if (NULL == context)
		return -EINVAL;
	if (NULL == ibs_request)
		return -EINVAL;

	r = 0;
	for (i = 0; i < number_of_requests; i++) {
		r = amdgpu_cs_submit_one(context, ibs_request);
		if (r)
			break;
		ibs_request++;
	}

	return r;
}

/**
 * Calculate absolute timeout.
 *
 * \param   timeout - \c [in] timeout in nanoseconds.
 *
 * \return  absolute timeout in nanoseconds
*/
drm_private uint64_t amdgpu_cs_calculate_timeout(uint64_t timeout)
{
	int r;

	if (timeout != AMDGPU_TIMEOUT_INFINITE) {
		struct timespec current;
		uint64_t current_ns;
		r = clock_gettime(CLOCK_MONOTONIC, &current);
		if (r) {
			fprintf(stderr, "clock_gettime() returned error (%d)!", errno);
			return AMDGPU_TIMEOUT_INFINITE;
		}

		current_ns = ((uint64_t)current.tv_sec) * 1000000000ull;
		current_ns += current.tv_nsec;
		timeout += current_ns;
		if (timeout < current_ns)
			timeout = AMDGPU_TIMEOUT_INFINITE;
	}
	return timeout;
}

static int amdgpu_ioctl_wait_cs(amdgpu_context_handle context,
				unsigned ip,
				unsigned ip_instance,
				uint32_t ring,
				uint64_t handle,
				uint64_t timeout_ns,
				uint64_t flags,
				bool *busy)
{
	amdgpu_device_handle dev = context->dev;
	union drm_amdgpu_wait_cs args;
	int r;

	memset(&args, 0, sizeof(args));
	args.in.handle = handle;
	args.in.ip_type = ip;
	args.in.ip_instance = ip_instance;
	args.in.ring = ring;
	args.in.ctx_id = context->id;

	if (flags & AMDGPU_QUERY_FENCE_TIMEOUT_IS_ABSOLUTE)
		args.in.timeout = timeout_ns;
	else
		args.in.timeout = amdgpu_cs_calculate_timeout(timeout_ns);

	r = drmIoctl(dev->fd, DRM_IOCTL_AMDGPU_WAIT_CS, &args);
	if (r)
		return -errno;

	*busy = args.out.status;
	return 0;
}

int amdgpu_cs_query_fence_status(struct amdgpu_cs_fence *fence,
				 uint64_t timeout_ns,
				 uint64_t flags,
				 uint32_t *expired)
{
	bool busy = true;
	int r;

	if (NULL == fence)
		return -EINVAL;
	if (NULL == expired)
		return -EINVAL;
	if (NULL == fence->context)
		return -EINVAL;
	if (fence->ip_type >= AMDGPU_HW_IP_NUM)
		return -EINVAL;
	if (fence->ring >= AMDGPU_CS_MAX_RINGS)
		return -EINVAL;
	if (fence->fence == AMDGPU_NULL_SUBMIT_SEQ) {
		*expired = true;
		return 0;
	}

	*expired = false;

	r = amdgpu_ioctl_wait_cs(fence->context, fence->ip_type,
				fence->ip_instance, fence->ring,
			       	fence->fence, timeout_ns, flags, &busy);

	if (!r && !busy)
		*expired = true;

	return r;
}

static int amdgpu_ioctl_wait_fences(struct amdgpu_cs_fence *fences,
				    uint32_t fence_count,
				    bool wait_all,
				    uint64_t timeout_ns,
				    uint32_t *status,
				    uint32_t *first)
{
	struct drm_amdgpu_fence *drm_fences;
	amdgpu_device_handle dev = fences[0].context->dev;
	union drm_amdgpu_wait_fences args;
	int r;
	uint32_t i;

	drm_fences = alloca(sizeof(struct drm_amdgpu_fence) * fence_count);
	for (i = 0; i < fence_count; i++) {
		drm_fences[i].ctx_id = fences[i].context->id;
		drm_fences[i].ip_type = fences[i].ip_type;
		drm_fences[i].ip_instance = fences[i].ip_instance;
		drm_fences[i].ring = fences[i].ring;
		drm_fences[i].seq_no = fences[i].fence;
	}

	memset(&args, 0, sizeof(args));
	args.in.fences = (uint64_t)(uintptr_t)drm_fences;
	args.in.fence_count = fence_count;
	args.in.wait_all = wait_all;
	args.in.timeout_ns = amdgpu_cs_calculate_timeout(timeout_ns);

	r = drmIoctl(dev->fd, DRM_IOCTL_AMDGPU_WAIT_FENCES, &args);
	if (r)
		return -errno;

	*status = args.out.status;

	if (first)
		*first = args.out.first_signaled;

	return 0;
}

int amdgpu_cs_wait_fences(struct amdgpu_cs_fence *fences,
			  uint32_t fence_count,
			  bool wait_all,
			  uint64_t timeout_ns,
			  uint32_t *status,
			  uint32_t *first)
{
	uint32_t ioctl_status = 0;
	uint32_t i;
	int r;

	/* Sanity check */
	if (NULL == fences)
		return -EINVAL;
	if (NULL == status)
		return -EINVAL;
	if (fence_count <= 0)
		return -EINVAL;
	for (i = 0; i < fence_count; i++) {
		if (NULL == fences[i].context)
			return -EINVAL;
		if (fences[i].ip_type >= AMDGPU_HW_IP_NUM)
			return -EINVAL;
		if (fences[i].ring >= AMDGPU_CS_MAX_RINGS)
			return -EINVAL;
	}

	*status = 0;

	r = amdgpu_ioctl_wait_fences(fences, fence_count, wait_all, timeout_ns,
					&ioctl_status, first);

	if (!r)
		*status = ioctl_status;

	return r;
}

int amdgpu_cs_create_semaphore(amdgpu_semaphore_handle *sem)
{
	struct amdgpu_semaphore *gpu_semaphore;

	if (NULL == sem)
		return -EINVAL;

	gpu_semaphore = calloc(1, sizeof(struct amdgpu_semaphore));
	if (NULL == gpu_semaphore)
		return -ENOMEM;

	atomic_set(&gpu_semaphore->refcount, 1);
	*sem = gpu_semaphore;

	return 0;
}

int amdgpu_cs_signal_semaphore(amdgpu_context_handle ctx,
			       uint32_t ip_type,
			       uint32_t ip_instance,
			       uint32_t ring,
			       amdgpu_semaphore_handle sem)
{
	if (NULL == ctx)
		return -EINVAL;
	if (ip_type >= AMDGPU_HW_IP_NUM)
		return -EINVAL;
	if (ring >= AMDGPU_CS_MAX_RINGS)
		return -EINVAL;
	if (NULL == sem)
		return -EINVAL;
	/* sem has been signaled */
	if (sem->signal_fence.context)
		return -EINVAL;
	pthread_mutex_lock(&ctx->sequence_mutex);
	sem->signal_fence.context = ctx;
	sem->signal_fence.ip_type = ip_type;
	sem->signal_fence.ip_instance = ip_instance;
	sem->signal_fence.ring = ring;
	sem->signal_fence.fence = ctx->last_seq[ip_type][ip_instance][ring];
	update_references(NULL, &sem->refcount);
	pthread_mutex_unlock(&ctx->sequence_mutex);
	return 0;
}

int amdgpu_cs_wait_semaphore(amdgpu_context_handle ctx,
			     uint32_t ip_type,
			     uint32_t ip_instance,
			     uint32_t ring,
			     amdgpu_semaphore_handle sem)
{
	if (NULL == ctx)
		return -EINVAL;
	if (ip_type >= AMDGPU_HW_IP_NUM)
		return -EINVAL;
	if (ring >= AMDGPU_CS_MAX_RINGS)
		return -EINVAL;
	if (NULL == sem)
		return -EINVAL;
	/* must signal first */
	if (NULL == sem->signal_fence.context)
		return -EINVAL;

	pthread_mutex_lock(&ctx->sequence_mutex);
	list_add(&sem->list, &ctx->sem_list[ip_type][ip_instance][ring]);
	pthread_mutex_unlock(&ctx->sequence_mutex);
	return 0;
}

static int amdgpu_cs_reset_sem(amdgpu_semaphore_handle sem)
{
	if (NULL == sem)
		return -EINVAL;
	if (NULL == sem->signal_fence.context)
		return -EINVAL;

	sem->signal_fence.context = NULL;;
	sem->signal_fence.ip_type = 0;
	sem->signal_fence.ip_instance = 0;
	sem->signal_fence.ring = 0;
	sem->signal_fence.fence = 0;

	return 0;
}

static int amdgpu_cs_unreference_sem(amdgpu_semaphore_handle sem)
{
	if (NULL == sem)
		return -EINVAL;

	if (update_references(&sem->refcount, NULL))
		free(sem);
	return 0;
}

int amdgpu_cs_destroy_semaphore(amdgpu_semaphore_handle sem)
{
	return amdgpu_cs_unreference_sem(sem);
}

int amdgpu_cs_create_sem(amdgpu_device_handle dev,
			 amdgpu_sem_handle *sem)
{
	union drm_amdgpu_sem args;
	int r;

	if (NULL == dev)
		return -EINVAL;

	/* Create the context */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_SEM_OP_CREATE_SEM;
	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
	if (r)
		return r;

	*sem = args.out.handle;

	return 0;
}

int amdgpu_cs_signal_sem(amdgpu_device_handle dev,
			 amdgpu_context_handle ctx,
			 uint32_t ip_type,
			 uint32_t ip_instance,
			 uint32_t ring,
			 amdgpu_sem_handle sem)
{
	union drm_amdgpu_sem args;

	if (NULL == dev)
		return -EINVAL;

	/* Create the context */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_SEM_OP_SIGNAL_SEM;
	args.in.ctx_id = ctx->id;
	args.in.ip_type = ip_type;
	args.in.ip_instance = ip_instance;
	args.in.ring = ring;
	args.in.handle = sem;
	return drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
}

int amdgpu_cs_wait_sem(amdgpu_device_handle dev,
		       amdgpu_context_handle ctx,
		       uint32_t ip_type,
		       uint32_t ip_instance,
		       uint32_t ring,
		       amdgpu_sem_handle sem)
{
	union drm_amdgpu_sem args;

	if (NULL == dev)
		return -EINVAL;

	/* Create the context */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_SEM_OP_WAIT_SEM;
	args.in.ctx_id = ctx->id;
	args.in.ip_type = ip_type;
	args.in.ip_instance = ip_instance;
	args.in.ring = ring;
	args.in.handle = sem;
	args.in.seq = 0;
	return drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
}

int amdgpu_cs_export_sem(amdgpu_device_handle dev,
			  amdgpu_sem_handle sem,
			  int *shared_handle)
{
	union drm_amdgpu_sem args;
	int r;

	if (NULL == dev)
		return -EINVAL;

	/* Create the context */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_SEM_OP_EXPORT_SEM;
	args.in.handle = sem;
	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
	if (r)
		return r;
	*shared_handle = args.out.fd;
	return 0;
}

int amdgpu_cs_import_sem(amdgpu_device_handle dev,
			  int shared_handle,
			  amdgpu_sem_handle *sem)
{
	union drm_amdgpu_sem args;
	int r;

	if (NULL == dev)
		return -EINVAL;

	/* Create the context */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_SEM_OP_IMPORT_SEM;
	args.in.handle = shared_handle;
	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
	if (r)
		return r;
	*sem = args.out.handle;
	return 0;
}


int amdgpu_cs_destroy_sem(amdgpu_device_handle dev,
			  amdgpu_sem_handle sem)
{
	union drm_amdgpu_sem args;
	int r;

	if (NULL == dev)
		return -EINVAL;

	/* Create the context */
	memset(&args, 0, sizeof(args));
	args.in.op = AMDGPU_SEM_OP_DESTROY_SEM;
	args.in.handle = sem;
	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
	if (r)
		return r;

	return 0;
}

[-- Attachment #5: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]               ` <58B4D68E.5080606-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-09  3:52                 ` Dave Airlie
       [not found]                   ` <CAPM=9tw3nbGe+gaOpoBZeXfmS5+C3R4eK=uT8AL3krL0PMR0LA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-03-09  3:52 UTC (permalink / raw)
  To: zhoucm1
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Koenig, Christian, Pierre-Loup Griffais

On 28 February 2017 at 11:46, zhoucm1 <david1.zhou@amd.com> wrote:
> Hi Dave,
>
> The attached is our semaphore implementation, amdgpu_cs.c is drm file, the
> others are kernel file.
> Any suggestion?
Thanks,

I've built a tree with all these in it, and started looking into the interface.

I do wonder if we need the separate sem signal/wait interface, I think
we should just add
semaphore chunks to the CS interface.

I'm just playing around with this now.

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                   ` <CAPM=9tw3nbGe+gaOpoBZeXfmS5+C3R4eK=uT8AL3krL0PMR0LA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-09  4:24                     ` Dave Airlie
       [not found]                       ` <CAPM=9twMvVoKCCmQUVsB6uD18j1e9cNq9eNqviVFy6F8v7OdOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-03-09  4:24 UTC (permalink / raw)
  To: zhoucm1
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Koenig, Christian, Pierre-Loup Griffais

[-- Attachment #1: Type: text/plain, Size: 756 bytes --]

I've attached two patches for RFC at the moment, I haven't finished
the userspace for these yet, but just wanted to get some
ideas/feedback.

Dave.

On 9 March 2017 at 13:52, Dave Airlie <airlied-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On 28 February 2017 at 11:46, zhoucm1 <david1.zhou-5C7GfCeVMHo@public.gmane.org> wrote:
>> Hi Dave,
>>
>> The attached is our semaphore implementation, amdgpu_cs.c is drm file, the
>> others are kernel file.
>> Any suggestion?
> Thanks,
>
> I've built a tree with all these in it, and started looking into the interface.
>
> I do wonder if we need the separate sem signal/wait interface, I think
> we should just add
> semaphore chunks to the CS interface.
>
> I'm just playing around with this now.
>
> Dave.

[-- Attachment #2: 0001-amdgpu-cs-split-out-fence-dependency-checking.patch --]
[-- Type: text/x-patch, Size: 3309 bytes --]

From 66852d3e1dc42421eb1cfd9640c043bba70931af Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@redhat.com>
Date: Thu, 9 Mar 2017 03:45:52 +0000
Subject: [PATCH 1/2] amdgpu/cs: split out fence dependency checking

---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 86 +++++++++++++++++++---------------
 1 file changed, 48 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d2d0f60..d72b6e8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -961,56 +961,66 @@ static int amdgpu_cs_ib_fill(struct amdgpu_device *adev,
 	return 0;
 }
 
-static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
-				  struct amdgpu_cs_parser *p)
+static int amdgpu_process_fence_dep(struct amdgpu_device *adev,
+				    struct amdgpu_cs_parser *p,
+				    struct amdgpu_cs_chunk *chunk)
 {
 	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
-	int i, j, r;
-
-	for (i = 0; i < p->nchunks; ++i) {
-		struct drm_amdgpu_cs_chunk_dep *deps;
-		struct amdgpu_cs_chunk *chunk;
-		unsigned num_deps;
+	unsigned num_deps;
+	int i, r;
+	struct drm_amdgpu_cs_chunk_dep *deps;
 
-		chunk = &p->chunks[i];
+	deps = (struct drm_amdgpu_cs_chunk_dep *)chunk->kdata;
+	num_deps = chunk->length_dw * 4 /
+		sizeof(struct drm_amdgpu_cs_chunk_dep);
 
-		if (chunk->chunk_id != AMDGPU_CHUNK_ID_DEPENDENCIES)
-			continue;
+	for (i = 0; i < num_deps; ++i) {
+		struct amdgpu_ring *ring;
+		struct amdgpu_ctx *ctx;
+		struct dma_fence *fence;
 
-		deps = (struct drm_amdgpu_cs_chunk_dep *)chunk->kdata;
-		num_deps = chunk->length_dw * 4 /
-			sizeof(struct drm_amdgpu_cs_chunk_dep);
+		r = amdgpu_cs_get_ring(adev, deps[i].ip_type,
+				       deps[i].ip_instance,
+				       deps[i].ring, &ring);
+		if (r)
+			return r;
 
-		for (j = 0; j < num_deps; ++j) {
-			struct amdgpu_ring *ring;
-			struct amdgpu_ctx *ctx;
-			struct dma_fence *fence;
+		ctx = amdgpu_ctx_get(fpriv, deps[i].ctx_id);
+		if (ctx == NULL)
+			return -EINVAL;
 
-			r = amdgpu_cs_get_ring(adev, deps[j].ip_type,
-					       deps[j].ip_instance,
-					       deps[j].ring, &ring);
+		fence = amdgpu_ctx_get_fence(ctx, ring,
+					     deps[i].handle);
+		if (IS_ERR(fence)) {
+			r = PTR_ERR(fence);
+			amdgpu_ctx_put(ctx);
+			return r;
+		} else if (fence) {
+			r = amdgpu_sync_fence(adev, &p->job->sync,
+					      fence);
+			dma_fence_put(fence);
+			amdgpu_ctx_put(ctx);
 			if (r)
 				return r;
+		}
+	}
+	return 0;
+}
 
-			ctx = amdgpu_ctx_get(fpriv, deps[j].ctx_id);
-			if (ctx == NULL)
-				return -EINVAL;
+static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
+				  struct amdgpu_cs_parser *p)
+{
+	int i, r;
 
-			fence = amdgpu_ctx_get_fence(ctx, ring,
-						     deps[j].handle);
-			if (IS_ERR(fence)) {
-				r = PTR_ERR(fence);
-				amdgpu_ctx_put(ctx);
-				return r;
+	for (i = 0; i < p->nchunks; ++i) {
+		struct amdgpu_cs_chunk *chunk;
 
-			} else if (fence) {
-				r = amdgpu_sync_fence(adev, &p->job->sync,
-						      fence);
-				dma_fence_put(fence);
-				amdgpu_ctx_put(ctx);
-				if (r)
-					return r;
-			}
+		chunk = &p->chunks[i];
+
+		if (chunk->chunk_id == AMDGPU_CHUNK_ID_DEPENDENCIES) {
+			r = amdgpu_process_fence_dep(adev, p, chunk);
+			if (r)
+				return r;
 		}
 	}
 
-- 
2.7.4


[-- Attachment #3: 0002-RFC-drm-amdgpu-add-shared-semaphores-support.patch --]
[-- Type: text/x-patch, Size: 20535 bytes --]

From a41814e96ccbfc7929be05d30f2061e668c5b4af Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@redhat.com>
Date: Wed, 8 Mar 2017 03:42:45 +0000
Subject: [PATCH 2/2] [RFC] drm/amdgpu: add shared semaphores support.

This is based on code provided by AMD (Chunming Zhou).

I've changed the code so the semaphore waits/signals are
passed in cs chunks rather than via separate ioctls.

This code is finished, I just wanted to ship it out early
in case anyone spots a problem, I'll finish off the libdrm
and radv bits and send more later.
---
 drivers/gpu/drm/amd/amdgpu/Makefile     |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  13 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  70 ++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |   8 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c | 360 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h |  49 +++++
 include/uapi/drm/amdgpu_drm.h           |  31 +++
 8 files changed, 532 insertions(+), 2 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2814aad..404bcba 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -24,7 +24,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
 	atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
 	amdgpu_prime.o amdgpu_vm.o amdgpu_ib.o amdgpu_pll.o \
 	amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
-	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o
+	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o amdgpu_sem.o
 
 # add asic specific block
 amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o kv_smc.o kv_dpm.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c1b9135..8a0f42f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -53,6 +53,7 @@
 #include "amdgpu_ucode.h"
 #include "amdgpu_ttm.h"
 #include "amdgpu_gds.h"
+#include "amdgpu_sem.h"
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
 #include "amdgpu_vm.h"
@@ -702,6 +703,8 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	spinlock_t		sem_handles_lock;
+	struct idr		sem_handles;
 };
 
 /*
@@ -1814,5 +1817,15 @@ amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser,
 		       uint64_t addr, struct amdgpu_bo **bo);
 int amdgpu_cs_sysvm_access_required(struct amdgpu_cs_parser *parser);
 
+int amdgpu_sem_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *filp);
+
+int amdgpu_sem_lookup_and_signal(struct amdgpu_fpriv *fpriv,
+				 uint32_t handle,
+				 struct dma_fence *fence);
+int amdgpu_sem_lookup_and_sync(struct amdgpu_device *adev,
+			       struct amdgpu_fpriv *fpriv,
+			       struct amdgpu_sync *sync,
+			       uint32_t handle);
 #include "amdgpu_object.h"
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index d72b6e8..b561c61 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -217,6 +217,8 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data)
 			break;
 
 		case AMDGPU_CHUNK_ID_DEPENDENCIES:
+		case AMDGPU_CHUNK_ID_SEM_WAIT:
+		case AMDGPU_CHUNK_ID_SEM_SIGNAL:
 			break;
 
 		default:
@@ -1007,6 +1009,28 @@ static int amdgpu_process_fence_dep(struct amdgpu_device *adev,
 	return 0;
 }
 
+static int amdgpu_process_sem_wait_dep(struct amdgpu_device *adev,
+				       struct amdgpu_cs_parser *p,
+				       struct amdgpu_cs_chunk *chunk)
+{
+	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+	unsigned num_deps;
+	int i, r;
+	struct drm_amdgpu_cs_chunk_dep *deps;
+
+	deps = (struct drm_amdgpu_cs_chunk_dep *)chunk->kdata;
+	num_deps = chunk->length_dw * 4 /
+		sizeof(struct drm_amdgpu_cs_chunk_dep);
+
+	for (i = 0; i < num_deps; ++i) {
+		r = amdgpu_sem_lookup_and_sync(adev, fpriv, &p->job->sync,
+					       deps[i].handle);
+		if (r)
+			return r;
+	}
+	return 0;
+}
+
 static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
 				  struct amdgpu_cs_parser *p)
 {
@@ -1021,12 +1045,56 @@ static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
 			r = amdgpu_process_fence_dep(adev, p, chunk);
 			if (r)
 				return r;
+		} else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SEM_WAIT) {
+			r = amdgpu_process_sem_wait_dep(adev, p, chunk);
+			if (r)
+				return r;
 		}
 	}
 
 	return 0;
 }
 
+static int amdgpu_process_sem_signal_dep(struct amdgpu_cs_parser *p,
+					 struct amdgpu_cs_chunk *chunk,
+					 struct dma_fence *fence)
+{
+	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+	unsigned num_deps;
+	int i, r;
+	struct drm_amdgpu_cs_chunk_dep *deps;
+
+	deps = (struct drm_amdgpu_cs_chunk_dep *)chunk->kdata;
+	num_deps = chunk->length_dw * 4 /
+		sizeof(struct drm_amdgpu_cs_chunk_dep);
+
+	for (i = 0; i < num_deps; ++i) {
+		r = amdgpu_sem_lookup_and_signal(fpriv, deps[i].handle,
+						 fence);
+		if (r)
+			return r;
+	}
+	return 0;
+}
+
+static int amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)
+{
+	int i, r;
+
+	for (i = 0; i < p->nchunks; ++i) {
+		struct amdgpu_cs_chunk *chunk;
+
+		chunk = &p->chunks[i];
+
+		if (chunk->chunk_id == AMDGPU_CHUNK_ID_SEM_SIGNAL) {
+			r = amdgpu_process_sem_signal_dep(p, chunk, p->fence);
+			if (r)
+				return r;
+		}
+	}
+	return 0;
+}
+
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 			    union drm_amdgpu_cs *cs)
 {
@@ -1054,7 +1122,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	trace_amdgpu_cs_ioctl(job);
 	amd_sched_entity_push_job(&job->base);
 
-	return 0;
+	return amdgpu_cs_post_dependencies(p);
 }
 
 int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index cf05006..67a4157 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -86,6 +86,7 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
 	for (i = 0; i < adev->num_rings; i++)
 		amd_sched_entity_fini(&adev->rings[i]->sched,
 				      &ctx->rings[i].entity);
+
 }
 
 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 61d94c7..5b3e3b4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -664,6 +664,8 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 	mutex_init(&fpriv->bo_list_lock);
 	idr_init(&fpriv->bo_list_handles);
 
+	spin_lock_init(&fpriv->sem_handles_lock);
+	idr_init(&fpriv->sem_handles);
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr);
 
 	file_priv->driver_priv = fpriv;
@@ -689,6 +691,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	struct amdgpu_device *adev = dev->dev_private;
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
 	struct amdgpu_bo_list *list;
+	struct amdgpu_sem *sem;
 	int handle;
 
 	if (!fpriv)
@@ -711,8 +714,12 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
 	idr_for_each_entry(&fpriv->bo_list_handles, list, handle)
 		amdgpu_bo_list_free(list);
+	
+//TODO	idr_for_each_entry(&fpriv->sem_handles, sem, handle)
+//		amdgpu_sem_destroy(fpriv, handle);
 
 	idr_destroy(&fpriv->bo_list_handles);
+	idr_destroy(&fpriv->sem_handles);
 	mutex_destroy(&fpriv->bo_list_lock);
 
 	kfree(fpriv);
@@ -896,6 +903,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_SEM, amdgpu_sem_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 const int amdgpu_max_kms_ioctl = ARRAY_SIZE(amdgpu_ioctls_kms);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
new file mode 100644
index 0000000..c7502f8
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
@@ -0,0 +1,360 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chunming Zhou <david1.zhou@amd.com>
+ */
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/poll.h>
+#include <linux/seq_file.h>
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/anon_inodes.h>
+#include "amdgpu_sem.h"
+#include "amdgpu.h"
+#include <drm/drmP.h>
+
+static void amdgpu_sem_core_free(struct kref *kref)
+{
+	struct amdgpu_sem_core *core = container_of(
+		kref, struct amdgpu_sem_core, kref);
+
+	if (core->file)
+		fput(core->file);
+
+	dma_fence_put(core->fence);
+	mutex_destroy(&core->lock);
+	kfree(core);
+}
+
+static void amdgpu_sem_free(struct kref *kref)
+{
+	struct amdgpu_sem *sem = container_of(
+		kref, struct amdgpu_sem, kref);
+
+	kref_put(&sem->base->kref, amdgpu_sem_core_free);
+	kfree(sem);
+}
+
+static inline void amdgpu_sem_get(struct amdgpu_sem *sem)
+{
+	if (sem)
+		kref_get(&sem->kref);
+}
+
+static inline void amdgpu_sem_put(struct amdgpu_sem *sem)
+{
+	if (sem)
+		kref_put(&sem->kref, amdgpu_sem_free);
+}
+
+static int amdgpu_sem_release(struct inode *inode, struct file *file)
+{
+	struct amdgpu_sem_core *core = file->private_data;
+
+	kref_put(&core->kref, amdgpu_sem_core_free);
+	return 0;
+}
+
+static unsigned int amdgpu_sem_poll(struct file *file, poll_table *wait)
+{
+	return 0;
+}
+
+static long amdgpu_sem_file_ioctl(struct file *file, unsigned int cmd,
+				   unsigned long arg)
+{
+	return 0;
+}
+
+static const struct file_operations amdgpu_sem_fops = {
+	.release = amdgpu_sem_release,
+	.poll = amdgpu_sem_poll,
+	.unlocked_ioctl = amdgpu_sem_file_ioctl,
+	.compat_ioctl = amdgpu_sem_file_ioctl,
+};
+
+
+static inline struct amdgpu_sem *amdgpu_sem_lookup(struct amdgpu_fpriv *fpriv, u32 handle)
+{
+	struct amdgpu_sem *sem;
+
+	spin_lock(&fpriv->sem_handles_lock);
+
+	/* Check if we currently have a reference on the object */
+	sem = idr_find(&fpriv->sem_handles, handle);
+	amdgpu_sem_get(sem);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+
+	return sem;
+}
+
+static struct amdgpu_sem_core *amdgpu_sem_core_alloc(void)
+{
+	struct amdgpu_sem_core *core;
+
+	core = kzalloc(sizeof(*core), GFP_KERNEL);
+	if (!core)
+		return NULL;
+
+	kref_init(&core->kref);
+	mutex_init(&core->lock);
+	return core;
+}
+
+static struct amdgpu_sem *amdgpu_sem_alloc(void)
+{
+	struct amdgpu_sem *sem;
+
+	sem = kzalloc(sizeof(*sem), GFP_KERNEL);
+	if (!sem)
+		return NULL;
+
+	kref_init(&sem->kref);
+
+	return sem;
+}
+
+static int amdgpu_sem_create(struct amdgpu_fpriv *fpriv, u32 *handle)
+{
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+	int ret;
+
+	sem = amdgpu_sem_alloc();
+	core = amdgpu_sem_core_alloc();
+	if (!sem || !core) {
+		kfree(sem);
+		kfree(core);
+		return -ENOMEM;
+	}
+
+	sem->base = core;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fpriv->sem_handles_lock);
+
+	ret = idr_alloc(&fpriv->sem_handles, sem, 1, 0, GFP_NOWAIT);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+	idr_preload_end();
+
+	if (ret < 0)
+		return ret;
+
+	*handle = ret;
+	return 0;
+}
+
+static int amdgpu_sem_signal(struct amdgpu_fpriv *fpriv,
+			     u32 handle, struct dma_fence *fence)
+{
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+
+	sem = amdgpu_sem_lookup(fpriv, handle);
+	if (!sem)
+		return -EINVAL;
+
+	core = sem->base;
+	mutex_lock(&core->lock);
+	dma_fence_put(core->fence);
+	core->fence = dma_fence_get(fence);
+	mutex_unlock(&core->lock);
+
+	amdgpu_sem_put(sem);
+	return 0;
+}
+
+static int amdgpu_sem_import(struct amdgpu_fpriv *fpriv,
+				       int fd, u32 *handle)
+{
+	struct file *file = fget(fd);
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+	int ret;
+
+	if (!file)
+		return -EINVAL;
+
+	core = file->private_data;
+	if (!core) {
+		fput(file);
+		return -EINVAL;
+	}
+
+	mutex_lock(&core->lock);
+	kref_get(&core->kref);
+	mutex_unlock(&core->lock);
+	sem = amdgpu_sem_alloc();
+	if (!sem) {
+		ret = -ENOMEM;
+		goto err_sem;
+	}
+
+	sem->base = core;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fpriv->sem_handles_lock);
+
+	ret = idr_alloc(&fpriv->sem_handles, sem, 1, 0, GFP_NOWAIT);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+	idr_preload_end();
+
+	if (ret < 0)
+		goto err_out;
+
+	*handle = ret;
+	fput(file);
+	return 0;
+err_sem:
+	kref_put(&core->kref, amdgpu_sem_core_free);
+err_out:
+	amdgpu_sem_put(sem);
+	fput(file);
+	return ret;
+
+}
+
+static int amdgpu_sem_export(struct amdgpu_fpriv *fpriv,
+				       u32 handle, int *fd)
+{
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+	int ret;
+
+	sem = amdgpu_sem_lookup(fpriv, handle);
+	if (!sem)
+		return -EINVAL;
+
+	core = sem->base;
+	mutex_lock(&core->lock);
+	if (!core->file) {
+		core->file = anon_inode_getfile("sem_file",
+					       &amdgpu_sem_fops,
+					       core, 0);
+		if (IS_ERR(core->file)) {
+			mutex_unlock(&core->lock);
+			ret = -ENOMEM;
+			goto err_put_sem;
+		}
+	}
+	kref_get(&core->kref);
+	mutex_unlock(&core->lock);
+
+	ret = get_unused_fd_flags(O_CLOEXEC);
+	if (ret < 0)
+		goto err_put_file;
+
+	fd_install(ret, core->file);
+
+	*fd = ret;
+	amdgpu_sem_put(sem);
+	return 0;
+
+err_put_file:
+	kref_put(&core->kref, amdgpu_sem_core_free);
+	fput(core->file);
+err_put_sem:
+	amdgpu_sem_put(sem);
+	return ret;
+}
+
+void amdgpu_sem_destroy(struct amdgpu_fpriv *fpriv, u32 handle)
+{
+	struct amdgpu_sem *sem = amdgpu_sem_lookup(fpriv, handle);
+	if (!sem)
+		return;
+
+	spin_lock(&fpriv->sem_handles_lock);
+	idr_remove(&fpriv->sem_handles, handle);
+	spin_unlock(&fpriv->sem_handles_lock);
+	kref_put(&sem->kref, amdgpu_sem_free);
+
+	kref_put(&sem->kref, amdgpu_sem_free);
+}
+
+int amdgpu_sem_lookup_and_sync(struct amdgpu_device *adev,
+			       struct amdgpu_fpriv *fpriv,
+			       struct amdgpu_sync *sync,
+			       uint32_t handle)
+{
+	int r;
+	struct amdgpu_sem *sem;
+
+	sem = amdgpu_sem_lookup(fpriv, handle);
+	if (!sem)
+		return -EINVAL;
+
+	r = amdgpu_sync_fence(adev, sync, sem->base->fence);
+	if (r)
+		goto err;
+	mutex_lock(&sem->base->lock);
+	dma_fence_put(sem->base->fence);
+	sem->base->fence = NULL;
+	mutex_unlock(&sem->base->lock);
+
+err:
+	amdgpu_sem_put(sem);
+	return r;
+
+}
+
+int amdgpu_sem_lookup_and_signal(struct amdgpu_fpriv *fpriv,
+				 uint32_t handle,
+				 struct dma_fence *fence)
+{
+	return amdgpu_sem_signal(fpriv, handle, fence);
+}
+
+int amdgpu_sem_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *filp)
+{
+	union drm_amdgpu_sem *args = data;
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	int r = 0;
+
+	switch (args->in.op) {
+	case AMDGPU_SEM_OP_CREATE_SEM:
+		r = amdgpu_sem_create(fpriv, &args->out.handle);
+		break;
+	case AMDGPU_SEM_OP_IMPORT_SEM:
+		r = amdgpu_sem_import(fpriv, args->in.handle, &args->out.handle);
+		break;
+	case AMDGPU_SEM_OP_EXPORT_SEM:
+		r = amdgpu_sem_export(fpriv, args->in.handle, &args->out.fd);
+		break;
+	case AMDGPU_SEM_OP_DESTROY_SEM:
+		amdgpu_sem_destroy(fpriv, args->in.handle);
+		break;
+	default:
+		r = -EINVAL;
+		break;
+	}
+
+	return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h
new file mode 100644
index 0000000..a4457c2
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h
@@ -0,0 +1,49 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Chunming Zhou <david1.zhou@amd.com>
+ *
+ */
+
+
+#ifndef _LINUX_AMDGPU_SEM_H
+#define _LINUX_AMDGPU_SEM_H
+
+#include <linux/types.h>
+#include <linux/kref.h>
+#include <linux/ktime.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/dma-fence.h>
+
+struct amdgpu_sem_core {
+	struct file		*file;
+	struct kref		kref;
+	struct dma_fence            *fence;
+	struct mutex	lock;
+};
+
+struct amdgpu_sem {
+	struct amdgpu_sem_core	*base;
+	struct kref		kref;
+};
+
+#endif /* _LINUX_AMDGPU_SEM_H */
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 5797283..d3ecdaf 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -51,6 +51,7 @@ extern "C" {
 #define DRM_AMDGPU_GEM_OP		0x10
 #define DRM_AMDGPU_GEM_USERPTR		0x11
 #define DRM_AMDGPU_WAIT_FENCES		0x12
+#define DRM_AMDGPU_SEM                  0x13
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -65,6 +66,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_OP, struct drm_amdgpu_gem_op)
 #define DRM_IOCTL_AMDGPU_GEM_USERPTR	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_USERPTR, struct drm_amdgpu_gem_userptr)
 #define DRM_IOCTL_AMDGPU_WAIT_FENCES	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_WAIT_FENCES, union drm_amdgpu_wait_fences)
+#define DRM_IOCTL_AMDGPU_SEM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_SEM, union drm_amdgpu_sem)
 
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
@@ -335,6 +337,33 @@ union drm_amdgpu_wait_fences {
 	struct drm_amdgpu_wait_fences_out out;
 };
 
+#define AMDGPU_SEM_OP_CREATE_SEM 0
+//efine AMDGPU_SEM_OP_WAIT_SEM 1
+//efine AMDGPU_SEM_OP_SIGNAL_SEM 2
+#define AMDGPU_SEM_OP_IMPORT_SEM 3
+#define AMDGPU_SEM_OP_EXPORT_SEM 4
+#define AMDGPU_SEM_OP_DESTROY_SEM 5
+
+struct drm_amdgpu_sem_in {
+	__u32 ctx_id;
+	__u32 ip_type;
+	__u32 ip_instance;
+	__u32 ring;
+	__u32 op;
+	__u32 seq;
+	__u32 handle;
+};
+
+struct drm_amdgpu_sem_out {	
+	__u32 fd;
+	__u32 handle;
+};
+
+union drm_amdgpu_sem {
+	struct drm_amdgpu_sem_in in;
+	struct drm_amdgpu_sem_out out;
+};
+
 #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO	0
 #define AMDGPU_GEM_OP_SET_PLACEMENT		1
 
@@ -390,6 +419,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_IB		0x01
 #define AMDGPU_CHUNK_ID_FENCE		0x02
 #define AMDGPU_CHUNK_ID_DEPENDENCIES	0x03
+#define AMDGPU_CHUNK_ID_SEM_WAIT        0x04
+#define AMDGPU_CHUNK_ID_SEM_SIGNAL      0x05
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
-- 
2.7.4


[-- Attachment #4: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                       ` <CAPM=9twMvVoKCCmQUVsB6uD18j1e9cNq9eNqviVFy6F8v7OdOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-09  7:00                         ` zhoucm1
       [not found]                           ` <58C0FD86.8040808-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: zhoucm1 @ 2017-03-09  7:00 UTC (permalink / raw)
  To: Dave Airlie
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Koenig, Christian, Pierre-Loup Griffais

[-- Attachment #1: Type: text/plain, Size: 981 bytes --]

Hi Dave,

We have already completed implementation as the attached for both kernel 
and libdrm. We discuss it on top of this.


Thanks,
David Zhou

On 2017年03月09日 12:24, Dave Airlie wrote:
> I've attached two patches for RFC at the moment, I haven't finished
> the userspace for these yet, but just wanted to get some
> ideas/feedback.
>
> Dave.
>
> On 9 March 2017 at 13:52, Dave Airlie <airlied-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 28 February 2017 at 11:46, zhoucm1 <david1.zhou-5C7GfCeVMHo@public.gmane.org> wrote:
>>> Hi Dave,
>>>
>>> The attached is our semaphore implementation, amdgpu_cs.c is drm file, the
>>> others are kernel file.
>>> Any suggestion?
>> Thanks,
>>
>> I've built a tree with all these in it, and started looking into the interface.
>>
>> I do wonder if we need the separate sem signal/wait interface, I think
>> we should just add
>> semaphore chunks to the CS interface.
>>
>> I'm just playing around with this now.
>>
>> Dave.


[-- Attachment #2: 0001-drm-amdgpu-add-new-semaphore-object-in-kernel-side-V.patch --]
[-- Type: text/x-patch, Size: 21245 bytes --]

>From 030ab323340d5557cd0ccf07d41f932b762745ac Mon Sep 17 00:00:00 2001
From: Chunming Zhou <David1.Zhou-5C7GfCeVMHo@public.gmane.org>
Date: Fri, 23 Sep 2016 10:22:22 +0800
Subject: [PATCH] drm/amdgpu: add new semaphore object in kernel side V3

So that semaphore can be shared across porcess across devices.

V2: add import/export
V3: some bug fixes

Signed-off-by: Chunming Zhou <David1.Zhou-5C7GfCeVMHo@public.gmane.org> (v1, v3)
Signed-off-by: Flora Cui <Flora.Cui-5C7GfCeVMHo@public.gmane.org> (v2)
Reviewed-by: Monk Liu <monk.liu-5C7GfCeVMHo@public.gmane.org> (v1)
Acked-by: Hawking Zhang <Hawking.Zhang-5C7GfCeVMHo@public.gmane.org> (v2)
Reviewed-by: David Mao <David.Mao-5C7GfCeVMHo@public.gmane.org> (v3)

Change-Id: I88e2168328d005a42b41eb7b0c60530a92126829
---
 drivers/gpu/drm/amd/amdgpu/Makefile     |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  13 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |   6 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c | 444 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h |  50 ++++
 include/uapi/drm/amdgpu_drm.h           |  32 +++
 8 files changed, 555 insertions(+), 4 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 8870e2e..0075287 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -30,7 +30,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
 	atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
 	amdgpu_prime.o amdgpu_vm.o amdgpu_ib.o amdgpu_pll.o \
 	amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
-	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o
+	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o amdgpu_sem.o
 
 # add asic specific block
 amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o kv_smc.o kv_dpm.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 4435b36..d3b1593 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -56,6 +56,7 @@
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
 #include "amdgpu_vm.h"
+#include "amdgpu_sem.h"
 #include "amd_powerplay.h"
 #include "amdgpu_dpm.h"
 #include "amdgpu_acp.h"
@@ -665,6 +666,8 @@ struct amdgpu_ctx_ring {
 	uint64_t		sequence;
 	struct fence		**fences;
 	struct amd_sched_entity	entity;
+	struct list_head	sem_list;
+	struct mutex            sem_lock;
 };
 
 struct amdgpu_ctx {
@@ -708,6 +711,8 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	spinlock_t		sem_handles_lock;
+	struct idr		sem_handles;
 };
 
 /*
@@ -1243,6 +1248,14 @@ int amdgpu_gem_metadata_ioctl(struct drm_device *dev, void *data,
 int amdgpu_freesync_ioctl(struct drm_device *dev, void *data,
 			    struct drm_file *filp);
 
+int amdgpu_sem_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *filp);
+
+int amdgpu_sem_add_cs(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
+		      struct amdgpu_sync *sync);
+
+void amdgpu_sem_destroy(struct amdgpu_fpriv *fpriv, u32 handle);
+
 /* VRAM scratch page for HDP bug, default vram page */
 struct amdgpu_vram_scratch {
 	struct amdgpu_bo		*robj;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index aafe11e..92b1423 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -1024,7 +1024,7 @@ static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
 		}
 	}
 
-	return 0;
+	return amdgpu_sem_add_cs(p->ctx, p->job->ring, &p->job->sync);
 }
 
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index 6d86eae..66cf23c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -42,6 +42,8 @@ static int amdgpu_ctx_init(struct amdgpu_device *adev, struct amdgpu_ctx *ctx)
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
 		ctx->rings[i].sequence = 1;
 		ctx->rings[i].fences = &ctx->fences[amdgpu_sched_jobs * i];
+		INIT_LIST_HEAD(&ctx->rings[i].sem_list);
+		mutex_init(&ctx->rings[i].sem_lock);
 	}
 
 	ctx->reset_counter = atomic_read(&adev->gpu_reset_counter);
@@ -78,8 +80,10 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
 		return;
 
 	for (i = 0; i < AMDGPU_MAX_RINGS; ++i)
-		for (j = 0; j < amdgpu_sched_jobs; ++j)
+		for (j = 0; j < amdgpu_sched_jobs; ++j) {
 			fence_put(ctx->rings[i].fences[j]);
+			mutex_destroy(&ctx->rings[i].sem_lock);
+		}
 	kfree(ctx->fences);
 	ctx->fences = NULL;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index ee3720e..b973225 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -749,6 +749,8 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 
 	mutex_init(&fpriv->bo_list_lock);
 	idr_init(&fpriv->bo_list_handles);
+	spin_lock_init(&fpriv->sem_handles_lock);
+	idr_init(&fpriv->sem_handles);
 
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr);
 
@@ -775,6 +777,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	struct amdgpu_device *adev = dev->dev_private;
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
 	struct amdgpu_bo_list *list;
+	struct amdgpu_sem *sem;
 	int handle;
 
 	if (!fpriv)
@@ -803,6 +806,10 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	idr_destroy(&fpriv->bo_list_handles);
 	mutex_destroy(&fpriv->bo_list_lock);
 
+	idr_for_each_entry(&fpriv->sem_handles, sem, handle)
+		amdgpu_sem_destroy(fpriv, handle);
+	idr_destroy(&fpriv->sem_handles);
+
 	kfree(fpriv);
 	file_priv->driver_priv = NULL;
 
@@ -984,7 +991,8 @@ int amdgpu_get_vblank_timestamp_kms(struct drm_device *dev, unsigned int pipe,
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(AMDGPU_FREESYNC, amdgpu_freesync_ioctl, DRM_MASTER)
+	DRM_IOCTL_DEF_DRV(AMDGPU_FREESYNC, amdgpu_freesync_ioctl, DRM_MASTER),
+	DRM_IOCTL_DEF_DRV(AMDGPU_SEM, amdgpu_sem_ioctl, DRM_AUTH|DRM_UNLOCKED|DRM_RENDER_ALLOW),
 };
 const int amdgpu_max_kms_ioctl = ARRAY_SIZE(amdgpu_ioctls_kms);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
new file mode 100644
index 0000000..6681162
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
@@ -0,0 +1,444 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
+ */
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/poll.h>
+#include <linux/seq_file.h>
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/anon_inodes.h>
+#include "amdgpu_sem.h"
+#include "amdgpu.h"
+#include <drm/drmP.h>
+
+static int amdgpu_sem_cring_add(struct amdgpu_fpriv *fpriv,
+				struct drm_amdgpu_sem_in *in,
+				struct amdgpu_sem *sem);
+
+static void amdgpu_sem_core_free(struct kref *kref)
+{
+	struct amdgpu_sem_core *core = container_of(
+		kref, struct amdgpu_sem_core, kref);
+
+	fence_put(core->fence);
+	mutex_destroy(&core->lock);
+	kfree(core);
+}
+
+static void amdgpu_sem_free(struct kref *kref)
+{
+	struct amdgpu_sem *sem = container_of(
+		kref, struct amdgpu_sem, kref);
+
+	list_del(&sem->list);
+	kref_put(&sem->base->kref, amdgpu_sem_core_free);
+	kfree(sem);
+}
+
+static inline void amdgpu_sem_get(struct amdgpu_sem *sem)
+{
+	if (sem)
+		kref_get(&sem->kref);
+}
+
+static inline void amdgpu_sem_put(struct amdgpu_sem *sem)
+{
+	if (sem)
+		kref_put(&sem->kref, amdgpu_sem_free);
+}
+
+static int amdgpu_sem_release(struct inode *inode, struct file *file)
+{
+	struct amdgpu_sem_core *core = file->private_data;
+
+	kref_put(&core->kref, amdgpu_sem_core_free);
+	return 0;
+}
+
+static unsigned int amdgpu_sem_poll(struct file *file, poll_table *wait)
+{
+	return 0;
+}
+
+static long amdgpu_sem_file_ioctl(struct file *file, unsigned int cmd,
+				   unsigned long arg)
+{
+	return 0;
+}
+
+static const struct file_operations amdgpu_sem_fops = {
+	.release = amdgpu_sem_release,
+	.poll = amdgpu_sem_poll,
+	.unlocked_ioctl = amdgpu_sem_file_ioctl,
+	.compat_ioctl = amdgpu_sem_file_ioctl,
+};
+
+
+static inline struct amdgpu_sem *amdgpu_sem_lookup(struct amdgpu_fpriv *fpriv, u32 handle)
+{
+	struct amdgpu_sem *sem;
+
+	spin_lock(&fpriv->sem_handles_lock);
+
+	/* Check if we currently have a reference on the object */
+	sem = idr_find(&fpriv->sem_handles, handle);
+	amdgpu_sem_get(sem);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+
+	return sem;
+}
+
+static struct amdgpu_sem_core *amdgpu_sem_core_alloc(void)
+{
+	struct amdgpu_sem_core *core;
+
+	core = kzalloc(sizeof(*core), GFP_KERNEL);
+	if (!core)
+		return NULL;
+
+	kref_init(&core->kref);
+	mutex_init(&core->lock);
+	return core;
+}
+
+static struct amdgpu_sem *amdgpu_sem_alloc(void)
+{
+	struct amdgpu_sem *sem;
+
+	sem = kzalloc(sizeof(*sem), GFP_KERNEL);
+	if (!sem)
+		return NULL;
+
+	kref_init(&sem->kref);
+	INIT_LIST_HEAD(&sem->list);
+
+	return sem;
+}
+
+static int amdgpu_sem_create(struct amdgpu_fpriv *fpriv, u32 *handle)
+{
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+	int ret;
+
+	sem = amdgpu_sem_alloc();
+	core = amdgpu_sem_core_alloc();
+	if (!sem || !core) {
+		kfree(sem);
+		kfree(core);
+		return -ENOMEM;
+	}
+
+	sem->base = core;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fpriv->sem_handles_lock);
+
+	ret = idr_alloc(&fpriv->sem_handles, sem, 1, 0, GFP_NOWAIT);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+	idr_preload_end();
+
+	if (ret < 0)
+		return ret;
+
+	*handle = ret;
+	return 0;
+}
+
+static int amdgpu_sem_signal(struct amdgpu_fpriv *fpriv,
+				u32 handle, struct fence *fence)
+{
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+
+	sem = amdgpu_sem_lookup(fpriv, handle);
+	if (!sem)
+		return -EINVAL;
+
+	core = sem->base;
+	mutex_lock(&core->lock);
+	fence_put(core->fence);
+	core->fence = fence_get(fence);
+	mutex_unlock(&core->lock);
+
+	amdgpu_sem_put(sem);
+	return 0;
+}
+
+static int amdgpu_sem_wait(struct amdgpu_fpriv *fpriv,
+			  struct drm_amdgpu_sem_in *in)
+{
+	struct amdgpu_sem *sem;
+	int ret;
+
+	sem = amdgpu_sem_lookup(fpriv, in->handle);
+	if (!sem)
+		return -EINVAL;
+
+	ret = amdgpu_sem_cring_add(fpriv, in, sem);
+	amdgpu_sem_put(sem);
+
+	return ret;
+}
+
+static int amdgpu_sem_import(struct amdgpu_fpriv *fpriv,
+				       int fd, u32 *handle)
+{
+	struct file *file = fget(fd);
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+	int ret;
+
+	if (!file)
+		return -EINVAL;
+
+	core = file->private_data;
+	if (!core) {
+		fput(file);
+		return -EINVAL;
+	}
+
+	kref_get(&core->kref);
+	sem = amdgpu_sem_alloc();
+	if (!sem) {
+		ret = -ENOMEM;
+		goto err_sem;
+	}
+
+	sem->base = core;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fpriv->sem_handles_lock);
+
+	ret = idr_alloc(&fpriv->sem_handles, sem, 1, 0, GFP_NOWAIT);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+	idr_preload_end();
+
+	if (ret < 0)
+		goto err_out;
+
+	*handle = ret;
+	fput(file);
+	return 0;
+err_sem:
+	kref_put(&core->kref, amdgpu_sem_core_free);
+err_out:
+	amdgpu_sem_put(sem);
+	fput(file);
+	return ret;
+
+}
+
+static int amdgpu_sem_export(struct amdgpu_fpriv *fpriv,
+				       u32 handle, int *fd)
+{
+	struct amdgpu_sem *sem;
+	struct amdgpu_sem_core *core;
+	int ret;
+
+	sem = amdgpu_sem_lookup(fpriv, handle);
+	if (!sem)
+		return -EINVAL;
+
+	core = sem->base;
+	kref_get(&core->kref);
+	mutex_lock(&core->lock);
+	if (!core->file) {
+		core->file = anon_inode_getfile("sem_file",
+					       &amdgpu_sem_fops,
+					       core, 0);
+		if (IS_ERR(core->file)) {
+			mutex_unlock(&core->lock);
+			ret = -ENOMEM;
+			goto err_put_sem;
+		}
+	} else {
+		get_file(core->file);
+	}
+	mutex_unlock(&core->lock);
+
+	ret = get_unused_fd_flags(O_CLOEXEC);
+	if (ret < 0)
+		goto err_put_file;
+
+	fd_install(ret, core->file);
+
+	*fd = ret;
+	amdgpu_sem_put(sem);
+	return 0;
+
+err_put_file:
+	fput(core->file);
+err_put_sem:
+	kref_put(&core->kref, amdgpu_sem_core_free);
+	amdgpu_sem_put(sem);
+	return ret;
+}
+
+void amdgpu_sem_destroy(struct amdgpu_fpriv *fpriv, u32 handle)
+{
+	struct amdgpu_sem *sem = amdgpu_sem_lookup(fpriv, handle);
+	if (!sem)
+		return;
+
+	spin_lock(&fpriv->sem_handles_lock);
+	idr_remove(&fpriv->sem_handles, handle);
+	spin_unlock(&fpriv->sem_handles_lock);
+
+	kref_sub(&sem->kref, 2, amdgpu_sem_free);
+}
+
+static struct fence *amdgpu_sem_get_fence(struct amdgpu_fpriv *fpriv,
+					 struct drm_amdgpu_sem_in *in)
+{
+	struct amdgpu_ring *out_ring;
+	struct amdgpu_ctx *ctx;
+	struct fence *fence;
+	uint32_t ctx_id, ip_type, ip_instance, ring;
+	int r;
+
+	ctx_id = in->ctx_id;
+	ip_type = in->ip_type;
+	ip_instance = in->ip_instance;
+	ring = in->ring;
+	ctx = amdgpu_ctx_get(fpriv, ctx_id);
+	if (!ctx)
+		return NULL;
+	r = amdgpu_cs_get_ring(ctx->adev, ip_type, ip_instance, ring,
+			       &out_ring);
+	if (r) {
+		amdgpu_ctx_put(ctx);
+		return NULL;
+	}
+	/* get the last fence of this entity */
+	fence = amdgpu_ctx_get_fence(ctx, out_ring,
+				     in->seq ? in->seq :
+				     ctx->rings[out_ring->idx].sequence - 1);
+	amdgpu_ctx_put(ctx);
+
+	return fence;
+}
+
+static int amdgpu_sem_cring_add(struct amdgpu_fpriv *fpriv,
+				struct drm_amdgpu_sem_in *in,
+				struct amdgpu_sem *sem)
+{
+	struct amdgpu_ring *out_ring;
+	struct amdgpu_ctx *ctx;
+	uint32_t ctx_id, ip_type, ip_instance, ring;
+	int r;
+
+	ctx_id = in->ctx_id;
+	ip_type = in->ip_type;
+	ip_instance = in->ip_instance;
+	ring = in->ring;
+	ctx = amdgpu_ctx_get(fpriv, ctx_id);
+	if (!ctx)
+		return -EINVAL;
+	r = amdgpu_cs_get_ring(ctx->adev, ip_type, ip_instance, ring,
+			       &out_ring);
+	if (r)
+		goto err;
+	mutex_lock(&ctx->rings[out_ring->idx].sem_lock);
+	list_add(&sem->list, &ctx->rings[out_ring->idx].sem_list);
+	mutex_unlock(&ctx->rings[out_ring->idx].sem_lock);
+
+err:
+	amdgpu_ctx_put(ctx);
+	return r;
+}
+
+int amdgpu_sem_add_cs(struct amdgpu_ctx *ctx, struct amdgpu_ring *ring,
+		     struct amdgpu_sync *sync)
+{
+	struct amdgpu_sem *sem, *tmp;
+	int r = 0;
+
+	if (list_empty(&ctx->rings[ring->idx].sem_list))
+		return 0;
+
+	mutex_lock(&ctx->rings[ring->idx].sem_lock);
+	list_for_each_entry_safe(sem, tmp, &ctx->rings[ring->idx].sem_list,
+				 list) {
+		r = amdgpu_sync_fence(ctx->adev, sync, sem->base->fence);
+		if (r)
+			goto err;
+		mutex_lock(&sem->base->lock);
+		fence_put(sem->base->fence);
+		sem->base->fence = NULL;
+		mutex_unlock(&sem->base->lock);
+		list_del_init(&sem->list);
+	}
+err:
+	mutex_unlock(&ctx->rings[ring->idx].sem_lock);
+	return r;
+}
+
+int amdgpu_sem_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *filp)
+{
+	union drm_amdgpu_sem *args = data;
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	struct fence *fence;
+	int r = 0;
+
+	switch (args->in.op) {
+	case AMDGPU_SEM_OP_CREATE_SEM:
+		r = amdgpu_sem_create(fpriv, &args->out.handle);
+		break;
+	case AMDGPU_SEM_OP_WAIT_SEM:
+		r = amdgpu_sem_wait(fpriv, &args->in);
+		break;
+	case AMDGPU_SEM_OP_SIGNAL_SEM:
+		fence = amdgpu_sem_get_fence(fpriv, &args->in);
+		if (IS_ERR(fence)) {
+			r = PTR_ERR(fence);
+			return r;
+		}
+		r = amdgpu_sem_signal(fpriv, args->in.handle, fence);
+		fence_put(fence);
+		break;
+	case AMDGPU_SEM_OP_IMPORT_SEM:
+		r = amdgpu_sem_import(fpriv, args->in.handle, &args->out.handle);
+		break;
+	case AMDGPU_SEM_OP_EXPORT_SEM:
+		r = amdgpu_sem_export(fpriv, args->in.handle, &args->out.fd);
+		break;
+	case AMDGPU_SEM_OP_DESTROY_SEM:
+		amdgpu_sem_destroy(fpriv, args->in.handle);
+		break;
+	default:
+		r = -EINVAL;
+		break;
+	}
+
+	return r;
+}
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h
new file mode 100644
index 0000000..04296ca
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.h
@@ -0,0 +1,50 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors: Chunming Zhou <david1.zhou-5C7GfCeVMHo@public.gmane.org>
+ *
+ */
+
+
+#ifndef _LINUX_AMDGPU_SEM_H
+#define _LINUX_AMDGPU_SEM_H
+
+#include <linux/types.h>
+#include <linux/kref.h>
+#include <linux/ktime.h>
+#include <linux/list.h>
+#include <linux/spinlock.h>
+#include <linux/fence.h>
+
+struct amdgpu_sem_core {
+	struct file		*file;
+	struct kref		kref;
+	struct fence            *fence;
+	struct mutex	lock;
+};
+
+struct amdgpu_sem {
+	struct amdgpu_sem_core	*base;
+	struct kref		kref;
+	struct list_head        list;
+};
+
+#endif /* _LINUX_AMDGPU_SEM_H */
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 49358e7..d17f431 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -53,6 +53,8 @@
 #define DRM_AMDGPU_WAIT_FENCES		0x12
 #define DRM_AMDGPU_FREESYNC	        0x14
 
+#define DRM_AMDGPU_SEM			0x5b
+
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
 #define DRM_IOCTL_AMDGPU_CTX		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_CTX, union drm_amdgpu_ctx)
@@ -67,6 +69,7 @@
 #define DRM_IOCTL_AMDGPU_GEM_USERPTR	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_USERPTR, struct drm_amdgpu_gem_userptr)
 #define DRM_IOCTL_AMDGPU_WAIT_FENCES	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_WAIT_FENCES, union drm_amdgpu_wait_fences)
 #define DRM_IOCTL_AMDGPU_FREESYNC	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_FREESYNC, struct drm_amdgpu_freesync)
+#define DRM_IOCTL_AMDGPU_SEM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_SEM, union drm_amdgpu_sem)
 
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
@@ -192,6 +195,35 @@ struct drm_amdgpu_ctx_in {
 	union drm_amdgpu_ctx_out out;
 };
 
+/* sem related */
+#define AMDGPU_SEM_OP_CREATE_SEM        1
+#define AMDGPU_SEM_OP_WAIT_SEM	        2
+#define AMDGPU_SEM_OP_SIGNAL_SEM        3
+#define AMDGPU_SEM_OP_DESTROY_SEM       4
+#define AMDGPU_SEM_OP_IMPORT_SEM	5
+#define AMDGPU_SEM_OP_EXPORT_SEM	6
+
+struct drm_amdgpu_sem_in {
+	/** AMDGPU_SEM_OP_* */
+	uint32_t	op;
+	uint32_t        handle;
+	uint32_t	ctx_id;
+	uint32_t        ip_type;
+	uint32_t        ip_instance;
+	uint32_t        ring;
+	uint64_t        seq;
+};
+
+union drm_amdgpu_sem_out {
+	int32_t         fd;
+	uint32_t	handle;
+};
+
+union drm_amdgpu_sem {
+	struct drm_amdgpu_sem_in in;
+	union drm_amdgpu_sem_out out;
+};
+
 /*
  * This is not a reliable API and you should expect it to fail for any
  * number of reasons and have fallback path that do not use userptr to
-- 
1.9.1


[-- Attachment #3: 0001-amdgpu-add-new-semaphore-support-v2.patch --]
[-- Type: text/x-patch, Size: 9395 bytes --]

>From ec6e6f599fe61537ed42b9953126691f904626d4 Mon Sep 17 00:00:00 2001
From: Chunming Zhou <David1.Zhou-5C7GfCeVMHo@public.gmane.org>
Date: Thu, 22 Sep 2016 14:50:16 +0800
Subject: [PATCH 1/2] amdgpu: add new semaphore support v2

v2: add import/export functions.

Change-Id: I74b61611e975d6f2de051e3f3c7ba63177308bdb
Signed-off-by: Chunming Zhou <David1.Zhou-5C7GfCeVMHo@public.gmane.org> (v1)
Reviewed-by: Monk Liu <monk.liu-5C7GfCeVMHo@public.gmane.org> (v1)
Signed-off-by: Flora Cui <Flora.Cui-5C7GfCeVMHo@public.gmane.org> (v2)
Acked-by: Hawking Zhang <Hawking.Zhang-5C7GfCeVMHo@public.gmane.org> (v2)
---
 amdgpu/amdgpu.h          |  82 ++++++++++++++++++++++++++++-
 amdgpu/amdgpu_cs.c       | 133 +++++++++++++++++++++++++++++++++++++++++++++++
 include/drm/amdgpu_drm.h |  34 ++++++++++++
 3 files changed, 248 insertions(+), 1 deletion(-)

diff --git a/amdgpu/amdgpu.h b/amdgpu/amdgpu.h
index 941406e..eb75283 100644
--- a/amdgpu/amdgpu.h
+++ b/amdgpu/amdgpu.h
@@ -151,6 +151,12 @@ typedef struct amdgpu_ib *amdgpu_ib_handle;
  */
 typedef struct amdgpu_va *amdgpu_va_handle;
 
+/**
+ * Define handle for sem file
+ */
+typedef uint32_t amdgpu_sem_handle;
+
+
 /*--------------------------------------------------------------------------*/
 /* -------------------------- Structures ---------------------------------- */
 /*--------------------------------------------------------------------------*/
@@ -1336,6 +1342,80 @@ int amdgpu_va_range_alloc(enum amdgpu_gpu_va_range va_range_type,
 */
 int amdgpu_va_range_free(amdgpu_va_handle va_range_handle);
 
-#endif /* #ifdef _amdgpu_h_ */
+/**
+ *  create sem
+ *
+ * \param   dev    - [in] Device handle. See #amdgpu_device_initialize()
+ * \param   sem	   - \c [out] sem handle
+ *
+ * \return   0 on success\n
+ *          <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_create_sem(amdgpu_device_handle dev,
+			 amdgpu_sem_handle *sem);
+
+/**
+ *  signal sem
+ *
+ * \param   dev    - [in] Device handle. See #amdgpu_device_initialize()
+ * \param   context        - \c [in] GPU Context
+ * \param   ip_type        - \c [in] Hardware IP block type = AMDGPU_HW_IP_*
+ * \param   ip_instance    - \c [in] Index of the IP block of the same type
+ * \param   ring           - \c [in] Specify ring index of the IP
+ * \param   sem	   - \c [out] sem handle
+ *
+ * \return   0 on success\n
+ *          <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_signal_sem(amdgpu_device_handle dev,
+			 amdgpu_context_handle ctx,
+			 uint32_t ip_type,
+			 uint32_t ip_instance,
+			 uint32_t ring,
+			 amdgpu_sem_handle sem);
+
+/**
+ *  wait sem
+ *
+ * \param   dev    - [in] Device handle. See #amdgpu_device_initialize()
+ * \param   context        - \c [in] GPU Context
+ * \param   ip_type        - \c [in] Hardware IP block type = AMDGPU_HW_IP_*
+ * \param   ip_instance    - \c [in] Index of the IP block of the same type
+ * \param   ring           - \c [in] Specify ring index of the IP
+ * \param   sem	   - \c [out] sem handle
+ *
+ * \return   0 on success\n
+ *          <0 - Negative POSIX Error code
+ *
+*/
+int amdgpu_cs_wait_sem(amdgpu_device_handle dev,
+		       amdgpu_context_handle ctx,
+		       uint32_t ip_type,
+		       uint32_t ip_instance,
+		       uint32_t ring,
+		       amdgpu_sem_handle sem);
+
+int amdgpu_cs_export_sem(amdgpu_device_handle dev,
+			  amdgpu_sem_handle sem,
+			  int *shared_handle);
 
+int amdgpu_cs_import_sem(amdgpu_device_handle dev,
+			  int shared_handle,
+			  amdgpu_sem_handle *sem);
 
+/**
+ *  destroy sem
+ *
+ * \param   dev    - [in] Device handle. See #amdgpu_device_initialize()
+ * \param   sem	   - \c [out] sem handle
+ *
+ * \return   0 on success\n
+ *          <0 - Negative POSIX Error code
+ *
+ */
+int amdgpu_cs_destroy_sem(amdgpu_device_handle dev,
+			  amdgpu_sem_handle sem);
+
+#endif /* #ifdef _amdgpu_h_ */
diff --git a/amdgpu/amdgpu_cs.c b/amdgpu/amdgpu_cs.c
index c8101b8..c8d8593 100644
--- a/amdgpu/amdgpu_cs.c
+++ b/amdgpu/amdgpu_cs.c
@@ -20,6 +20,9 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
 */
+
+#include <sys/stat.h>
+#include <unistd.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <string.h>
@@ -913,3 +916,133 @@ int amdgpu_cs_query_fence_status(struct amdgpu_cs_query_fence *fence,
 	return r;
 }
 
+int amdgpu_cs_create_sem(amdgpu_device_handle dev,
+			 amdgpu_sem_handle *sem)
+{
+	union drm_amdgpu_sem args;
+	int r;
+
+	if (NULL == dev)
+		return -EINVAL;
+
+	/* Create the context */
+	memset(&args, 0, sizeof(args));
+	args.in.op = AMDGPU_SEM_OP_CREATE_SEM;
+	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
+	if (r)
+		return r;
+
+	*sem = args.out.handle;
+
+	return 0;
+}
+
+int amdgpu_cs_signal_sem(amdgpu_device_handle dev,
+			 amdgpu_context_handle ctx,
+			 uint32_t ip_type,
+			 uint32_t ip_instance,
+			 uint32_t ring,
+			 amdgpu_sem_handle sem)
+{
+	union drm_amdgpu_sem args;
+
+	if (NULL == dev)
+		return -EINVAL;
+
+	/* Create the context */
+	memset(&args, 0, sizeof(args));
+	args.in.op = AMDGPU_SEM_OP_SIGNAL_SEM;
+	args.in.ctx_id = ctx->id;
+	args.in.ip_type = ip_type;
+	args.in.ip_instance = ip_instance;
+	args.in.ring = ring;
+	args.in.handle = sem;
+	return drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
+}
+
+int amdgpu_cs_wait_sem(amdgpu_device_handle dev,
+		       amdgpu_context_handle ctx,
+		       uint32_t ip_type,
+		       uint32_t ip_instance,
+		       uint32_t ring,
+		       amdgpu_sem_handle sem)
+{
+	union drm_amdgpu_sem args;
+
+	if (NULL == dev)
+		return -EINVAL;
+
+	/* Create the context */
+	memset(&args, 0, sizeof(args));
+	args.in.op = AMDGPU_SEM_OP_WAIT_SEM;
+	args.in.ctx_id = ctx->id;
+	args.in.ip_type = ip_type;
+	args.in.ip_instance = ip_instance;
+	args.in.ring = ring;
+	args.in.handle = sem;
+	args.in.seq = 0;
+	return drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
+}
+
+int amdgpu_cs_export_sem(amdgpu_device_handle dev,
+			  amdgpu_sem_handle sem,
+			  int *shared_handle)
+{
+	union drm_amdgpu_sem args;
+	int r;
+
+	if (NULL == dev)
+		return -EINVAL;
+
+	/* Create the context */
+	memset(&args, 0, sizeof(args));
+	args.in.op = AMDGPU_SEM_OP_EXPORT_SEM;
+	args.in.handle = sem;
+	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
+	if (r)
+		return r;
+	*shared_handle = args.out.fd;
+	return 0;
+}
+
+int amdgpu_cs_import_sem(amdgpu_device_handle dev,
+			  int shared_handle,
+			  amdgpu_sem_handle *sem)
+{
+	union drm_amdgpu_sem args;
+	int r;
+
+	if (NULL == dev)
+		return -EINVAL;
+
+	/* Create the context */
+	memset(&args, 0, sizeof(args));
+	args.in.op = AMDGPU_SEM_OP_IMPORT_SEM;
+	args.in.handle = shared_handle;
+	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
+	if (r)
+		return r;
+	*sem = args.out.handle;
+	return 0;
+}
+
+
+int amdgpu_cs_destroy_sem(amdgpu_device_handle dev,
+			  amdgpu_sem_handle sem)
+{
+	union drm_amdgpu_sem args;
+	int r;
+
+	if (NULL == dev)
+		return -EINVAL;
+
+	/* Create the context */
+	memset(&args, 0, sizeof(args));
+	args.in.op = AMDGPU_SEM_OP_DESTROY_SEM;
+	args.in.handle = sem;
+	r = drmCommandWriteRead(dev->fd, DRM_AMDGPU_SEM, &args, sizeof(args));
+	if (r)
+		return r;
+
+	return 0;
+}
diff --git a/include/drm/amdgpu_drm.h b/include/drm/amdgpu_drm.h
index 89a938a..ccd9033 100644
--- a/include/drm/amdgpu_drm.h
+++ b/include/drm/amdgpu_drm.h
@@ -46,6 +46,9 @@
 #define DRM_AMDGPU_WAIT_CS		0x09
 #define DRM_AMDGPU_GEM_OP		0x10
 #define DRM_AMDGPU_GEM_USERPTR		0x11
+#define DRM_AMDGPU_WAIT_FENCES		0x12
+
+#define DRM_AMDGPU_SEM                  0x5b
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -59,6 +62,8 @@
 #define DRM_IOCTL_AMDGPU_WAIT_CS	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_WAIT_CS, union drm_amdgpu_wait_cs)
 #define DRM_IOCTL_AMDGPU_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_OP, struct drm_amdgpu_gem_op)
 #define DRM_IOCTL_AMDGPU_GEM_USERPTR	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_USERPTR, struct drm_amdgpu_gem_userptr)
+#define DRM_IOCTL_AMDGPU_WAIT_FENCES	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_WAIT_FENCES, union drm_amdgpu_wait_fences)
+#define DRM_IOCTL_AMDGPU_SEM            DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_SEM, union drm_amdgpu_sem)
 
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
@@ -182,6 +187,35 @@ union drm_amdgpu_ctx {
 	union drm_amdgpu_ctx_out out;
 };
 
+/* sync file related */
+#define AMDGPU_SEM_OP_CREATE_SEM       1
+#define AMDGPU_SEM_OP_WAIT_SEM         2
+#define AMDGPU_SEM_OP_SIGNAL_SEM       3
+#define AMDGPU_SEM_OP_DESTROY_SEM      4
+#define AMDGPU_SEM_OP_IMPORT_SEM       5
+#define AMDGPU_SEM_OP_EXPORT_SEM       6
+
+struct drm_amdgpu_sem_in {
+	/** AMDGPU_SEM_OP_* */
+	uint32_t        op;
+	uint32_t        handle;
+	uint32_t        ctx_id;
+	uint32_t        ip_type;
+	uint32_t        ip_instance;
+	uint32_t        ring;
+	uint64_t        seq;
+};
+
+union drm_amdgpu_sem_out {
+	int            fd;
+	uint32_t        handle;
+};
+
+union drm_amdgpu_sem {
+	struct drm_amdgpu_sem_in in;
+	union drm_amdgpu_sem_out out;
+};
+
 /*
  * This is not a reliable API and you should expect it to fail for any
  * number of reasons and have fallback path that do not use userptr to
-- 
1.9.1


[-- Attachment #4: 0002-test-case-for-export-import-sem.patch --]
[-- Type: text/x-patch, Size: 7289 bytes --]

>From 1d391323c06c03a90b1d349f8a8c79a29af8fc90 Mon Sep 17 00:00:00 2001
From: David Mao <david.mao-5C7GfCeVMHo@public.gmane.org>
Date: Mon, 23 Jan 2017 11:31:58 +0800
Subject: [PATCH 2/2] test case for export/import sem

Test covers basic functionality includes create/destroy/import/export/wait/signal

Change-Id: I8a8d767e5ef1889f8ac214fef98befba83969d8d
Signed-off-by: David Mao <david.mao-5C7GfCeVMHo@public.gmane.org>
Signed-off-by: Flora Cui <Flora.Cui-5C7GfCeVMHo@public.gmane.org>
Acked-by: Hawking Zhang <Hawking.Zhang-5C7GfCeVMHo@public.gmane.org>
---
 tests/amdgpu/basic_tests.c | 190 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 190 insertions(+)

diff --git a/tests/amdgpu/basic_tests.c b/tests/amdgpu/basic_tests.c
index 0083968..5a95ec9 100644
--- a/tests/amdgpu/basic_tests.c
+++ b/tests/amdgpu/basic_tests.c
@@ -214,6 +214,196 @@ static void amdgpu_command_submission_gfx(void)
 	CU_ASSERT_EQUAL(r, 0);
 }
 
+static void amdgpu_semaphore_test(void)
+{
+	amdgpu_context_handle context_handle[2];
+	amdgpu_semaphore_handle sem;
+	amdgpu_bo_handle ib_result_handle[2];
+	void *ib_result_cpu[2];
+	uint64_t ib_result_mc_address[2];
+	struct amdgpu_cs_request ibs_request[2] = {0};
+	struct amdgpu_cs_ib_info ib_info[2] = {0};
+	struct amdgpu_cs_fence fence_status = {0};
+	uint32_t *ptr;
+	uint32_t expired;
+	amdgpu_bo_list_handle bo_list[2];
+	amdgpu_va_handle va_handle[2];
+	amdgpu_sem_handle sem_handle, sem_handle_import;
+	int fd;
+	int r, i;
+
+	r = amdgpu_cs_create_semaphore(&sem);
+	CU_ASSERT_EQUAL(r, 0);
+	for (i = 0; i < 2; i++) {
+		r = amdgpu_cs_ctx_create(device_handle, &context_handle[i]);
+		CU_ASSERT_EQUAL(r, 0);
+
+		r = amdgpu_bo_alloc_and_map(device_handle, 4096, 4096,
+					    AMDGPU_GEM_DOMAIN_GTT, 0,
+					    &ib_result_handle[i], &ib_result_cpu[i],
+					    &ib_result_mc_address[i], &va_handle[i]);
+		CU_ASSERT_EQUAL(r, 0);
+
+		r = amdgpu_get_bo_list(device_handle, ib_result_handle[i],
+				       NULL, &bo_list[i]);
+		CU_ASSERT_EQUAL(r, 0);
+	}
+
+	/* 1. same context different engine */
+	ptr = ib_result_cpu[0];
+	ptr[0] = SDMA_NOP;
+	ib_info[0].ib_mc_address = ib_result_mc_address[0];
+	ib_info[0].size = 1;
+
+	ibs_request[0].ip_type = AMDGPU_HW_IP_DMA;
+	ibs_request[0].number_of_ibs = 1;
+	ibs_request[0].ibs = &ib_info[0];
+	ibs_request[0].resources = bo_list[0];
+	ibs_request[0].fence_info.handle = NULL;
+	r = amdgpu_cs_submit(context_handle[0], 0,&ibs_request[0], 1);
+	CU_ASSERT_EQUAL(r, 0);
+	r = amdgpu_cs_signal_semaphore(context_handle[0], AMDGPU_HW_IP_DMA, 0, 0, sem);
+	CU_ASSERT_EQUAL(r, 0);
+
+	r = amdgpu_cs_wait_semaphore(context_handle[0], AMDGPU_HW_IP_GFX, 0, 0, sem);
+	CU_ASSERT_EQUAL(r, 0);
+	ptr = ib_result_cpu[1];
+	ptr[0] = GFX_COMPUTE_NOP;
+	ib_info[1].ib_mc_address = ib_result_mc_address[1];
+	ib_info[1].size = 1;
+
+	ibs_request[1].ip_type = AMDGPU_HW_IP_GFX;
+	ibs_request[1].number_of_ibs = 1;
+	ibs_request[1].ibs = &ib_info[1];
+	ibs_request[1].resources = bo_list[1];
+	ibs_request[1].fence_info.handle = NULL;
+
+	r = amdgpu_cs_submit(context_handle[0], 0,&ibs_request[1], 1);
+	CU_ASSERT_EQUAL(r, 0);
+
+	fence_status.context = context_handle[0];
+	fence_status.ip_type = AMDGPU_HW_IP_GFX;
+	fence_status.ip_instance = 0;
+	fence_status.fence = ibs_request[1].seq_no;
+	r = amdgpu_cs_query_fence_status(&fence_status,
+					 500000000, 0, &expired);
+	CU_ASSERT_EQUAL(r, 0);
+	CU_ASSERT_EQUAL(expired, true);
+
+	/* 2. same engine different context */
+	ptr = ib_result_cpu[0];
+	ptr[0] = GFX_COMPUTE_NOP;
+	ib_info[0].ib_mc_address = ib_result_mc_address[0];
+	ib_info[0].size = 1;
+
+	ibs_request[0].ip_type = AMDGPU_HW_IP_GFX;
+	ibs_request[0].number_of_ibs = 1;
+	ibs_request[0].ibs = &ib_info[0];
+	ibs_request[0].resources = bo_list[0];
+	ibs_request[0].fence_info.handle = NULL;
+	r = amdgpu_cs_submit(context_handle[0], 0,&ibs_request[0], 1);
+	CU_ASSERT_EQUAL(r, 0);
+	r = amdgpu_cs_signal_semaphore(context_handle[0], AMDGPU_HW_IP_GFX, 0, 0, sem);
+	CU_ASSERT_EQUAL(r, 0);
+
+	r = amdgpu_cs_wait_semaphore(context_handle[1], AMDGPU_HW_IP_GFX, 0, 0, sem);
+	CU_ASSERT_EQUAL(r, 0);
+	ptr = ib_result_cpu[1];
+	ptr[0] = GFX_COMPUTE_NOP;
+	ib_info[1].ib_mc_address = ib_result_mc_address[1];
+	ib_info[1].size = 1;
+
+	ibs_request[1].ip_type = AMDGPU_HW_IP_GFX;
+	ibs_request[1].number_of_ibs = 1;
+	ibs_request[1].ibs = &ib_info[1];
+	ibs_request[1].resources = bo_list[1];
+	ibs_request[1].fence_info.handle = NULL;
+	r = amdgpu_cs_submit(context_handle[1], 0,&ibs_request[1], 1);
+
+	CU_ASSERT_EQUAL(r, 0);
+
+	fence_status.context = context_handle[1];
+	fence_status.ip_type = AMDGPU_HW_IP_GFX;
+	fence_status.ip_instance = 0;
+	fence_status.fence = ibs_request[1].seq_no;
+	r = amdgpu_cs_query_fence_status(&fence_status,
+					 500000000, 0, &expired);
+	CU_ASSERT_EQUAL(r, 0);
+	CU_ASSERT_EQUAL(expired, true);
+
+	/* 3. export/import sem test */
+	r = amdgpu_cs_create_sem(device_handle, &sem_handle);
+	CU_ASSERT_EQUAL(r, 0);
+
+	ptr = ib_result_cpu[0];
+	ptr[0] = SDMA_NOP;
+	ib_info[0].ib_mc_address = ib_result_mc_address[0];
+	ib_info[0].size = 1;
+
+	ibs_request[0].ip_type = AMDGPU_HW_IP_DMA;
+	ibs_request[0].number_of_ibs = 1;
+	ibs_request[0].ibs = &ib_info[0];
+	ibs_request[0].resources = bo_list[0];
+	ibs_request[0].fence_info.handle = NULL;
+	r = amdgpu_cs_submit(context_handle[0], 0,&ibs_request[0], 1);
+	CU_ASSERT_EQUAL(r, 0);
+	r = amdgpu_cs_signal_sem(device_handle, context_handle[0], AMDGPU_HW_IP_DMA, 0, 0, sem_handle);
+	CU_ASSERT_EQUAL(r, 0);
+
+	// export the semaphore and import in different context to wait.
+	r = amdgpu_cs_export_sem(device_handle, sem_handle, &fd);
+	CU_ASSERT_EQUAL(r, 0);
+
+	r = amdgpu_cs_import_sem(device_handle, fd, &sem_handle_import);
+	CU_ASSERT_EQUAL(r, 0);
+	close(fd);
+	r = amdgpu_cs_destroy_sem(device_handle, sem_handle);
+	CU_ASSERT_EQUAL(r, 0);
+
+	r = amdgpu_cs_wait_sem(device_handle, context_handle[1], AMDGPU_HW_IP_GFX, 0, 0, sem_handle_import);
+	CU_ASSERT_EQUAL(r, 0);
+	ptr = ib_result_cpu[1];
+	ptr[0] = GFX_COMPUTE_NOP;
+	ib_info[1].ib_mc_address = ib_result_mc_address[1];
+	ib_info[1].size = 1;
+
+	ibs_request[1].ip_type = AMDGPU_HW_IP_GFX;
+	ibs_request[1].number_of_ibs = 1;
+	ibs_request[1].ibs = &ib_info[1];
+	ibs_request[1].resources = bo_list[1];
+	ibs_request[1].fence_info.handle = NULL;
+
+	r = amdgpu_cs_submit(context_handle[1], 0,&ibs_request[1], 1);
+	CU_ASSERT_EQUAL(r, 0);
+
+	fence_status.context = context_handle[1];
+	fence_status.ip_type = AMDGPU_HW_IP_GFX;
+	fence_status.ip_instance = 0;
+	fence_status.fence = ibs_request[1].seq_no;
+	r = amdgpu_cs_query_fence_status(&fence_status,
+					 500000000, 0, &expired);
+	CU_ASSERT_EQUAL(r, 0);
+	CU_ASSERT_EQUAL(expired, true);
+
+	r = amdgpu_cs_destroy_sem(device_handle, sem_handle_import);
+	CU_ASSERT_EQUAL(r, 0);
+
+	for (i = 0; i < 2; i++) {
+		r = amdgpu_bo_unmap_and_free(ib_result_handle[i], va_handle[i],
+					     ib_result_mc_address[i], 4096);
+		CU_ASSERT_EQUAL(r, 0);
+
+		r = amdgpu_bo_list_destroy(bo_list[i]);
+		CU_ASSERT_EQUAL(r, 0);
+
+		r = amdgpu_cs_ctx_free(context_handle[i]);
+		CU_ASSERT_EQUAL(r, 0);
+	}
+
+	r = amdgpu_cs_destroy_semaphore(sem);
+	CU_ASSERT_EQUAL(r, 0);
+}
+
 static void amdgpu_command_submission_compute(void)
 {
 	amdgpu_context_handle context_handle;
-- 
1.9.1


[-- Attachment #5: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                           ` <58C0FD86.8040808-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-09  7:38                             ` Christian König
       [not found]                               ` <d5a5f1ba-4bcb-f374-f18e-b060cc40aa9e-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2017-03-09  7:38 UTC (permalink / raw)
  To: zhoucm1, Dave Airlie
  Cc: Mao, David, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Pierre-Loup Griffais

> I do wonder if we need the separate sem signal/wait interface, I think
> we should just add
> semaphore chunks to the CS interface.
Yeah, that's what I've said as well from the very first beginning.

Another question is if we should really create another implementation to 
share semaphores between processes.

In other words putting the current fences inside the semaphore into a 
sync_file with the signal_on_any bit set would have pretty much the same 
effect, except that the resulting object then had the sync_file 
semantics for adding new fences and can be used in the atomic IOCTLs as 
well.

Regards,
Christian.

Am 09.03.2017 um 08:00 schrieb zhoucm1:
> Hi Dave,
>
> We have already completed implementation as the attached for both 
> kernel and libdrm. We discuss it on top of this.
>
>
> Thanks,
> David Zhou
>
> On 2017年03月09日 12:24, Dave Airlie wrote:
>> I've attached two patches for RFC at the moment, I haven't finished
>> the userspace for these yet, but just wanted to get some
>> ideas/feedback.
>>
>> Dave.
>>
>> On 9 March 2017 at 13:52, Dave Airlie <airlied@gmail.com> wrote:
>>> On 28 February 2017 at 11:46, zhoucm1 <david1.zhou@amd.com> wrote:
>>>> Hi Dave,
>>>>
>>>> The attached is our semaphore implementation, amdgpu_cs.c is drm 
>>>> file, the
>>>> others are kernel file.
>>>> Any suggestion?
>>> Thanks,
>>>
>>> I've built a tree with all these in it, and started looking into the 
>>> interface.
>>>
>>> I do wonder if we need the separate sem signal/wait interface, I think
>>> we should just add
>>> semaphore chunks to the CS interface.
>>>
>>> I'm just playing around with this now.
>>>
>>> Dave.
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                               ` <d5a5f1ba-4bcb-f374-f18e-b060cc40aa9e-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-09  8:15                                 ` Dave Airlie
       [not found]                                   ` <CAPM=9twfZhb7vt_gEBE6LaUfthseX_PC_BZctCyu520MK32QCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-03-09  8:15 UTC (permalink / raw)
  To: Christian König
  Cc: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Pierre-Loup Griffais

On 9 March 2017 at 17:38, Christian König <christian.koenig@amd.com> wrote:
>> I do wonder if we need the separate sem signal/wait interface, I think
>> we should just add
>> semaphore chunks to the CS interface.
>
> Yeah, that's what I've said as well from the very first beginning.
>
> Another question is if we should really create another implementation to
> share semaphores between processes.
>
> In other words putting the current fences inside the semaphore into a
> sync_file with the signal_on_any bit set would have pretty much the same
> effect, except that the resulting object then had the sync_file semantics
> for adding new fences and can be used in the atomic IOCTLs as well.

So the vulkan external semaphore spec has two different type of semaphore
semantics, I'm not sure the sync_file semantics match the first type,
only the second.


I think we would still need separate objects to do the first type,
which I want for VR stuff..

I'll try and think about it a bit harder tomorrow.

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                   ` <CAPM=9twfZhb7vt_gEBE6LaUfthseX_PC_BZctCyu520MK32QCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-09  9:12                                     ` Christian König
       [not found]                                       ` <37118a87-28f2-c96d-18dc-a71292ea35d4-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2017-03-09  9:12 UTC (permalink / raw)
  To: Dave Airlie
  Cc: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Pierre-Loup Griffais

Am 09.03.2017 um 09:15 schrieb Dave Airlie:
> On 9 March 2017 at 17:38, Christian König <christian.koenig@amd.com> wrote:
>>> I do wonder if we need the separate sem signal/wait interface, I think
>>> we should just add
>>> semaphore chunks to the CS interface.
>> Yeah, that's what I've said as well from the very first beginning.
>>
>> Another question is if we should really create another implementation to
>> share semaphores between processes.
>>
>> In other words putting the current fences inside the semaphore into a
>> sync_file with the signal_on_any bit set would have pretty much the same
>> effect, except that the resulting object then had the sync_file semantics
>> for adding new fences and can be used in the atomic IOCTLs as well.
> So the vulkan external semaphore spec has two different type of semaphore
> semantics, I'm not sure the sync_file semantics match the first type,
> only the second.

I haven't completely read that part of the spec yet, but from what I 
know the first semantics is actually a bit scary and I'm not sure if we 
want to fully support that.

Especially that you can wait on a semaphore object which is not signaled 
yet can easily lead to deadlocks and bound resources in the kernel and 
windowing system.

Imagine that you send a command submission to the kernel with the 
request to wait for a semaphore object and then never signal that 
semaphore object. At least for amdgpu the kernel driver would accept 
that CS and push it into the scheduler. This operation needs memory, so 
by doing this the application would bind kernel memory without the 
prospect of releasing it anytime soon.

We could of course try to limit the amounts of waiting CS in the kernel, 
but then we have the problem of deadlocks again. E.g. the signaling CS 
wouldn't be accepted by the kernel because we have so many waiters.

Additional to that you can easily build deadlocks in the form CS A 
depends on CS B and CS B depends on CS A. The exact same problem for 
Android fences where discussed on the list as well, but the semantic 
there is especially designed so that you can't build deadlocks with it.

> I think we would still need separate objects to do the first type,
> which I want for VR stuff..

Which is perfectly reasonable, sharing the object between processes 
takes time. So you only want to do this once.

As a possible solution what do you think about adding some new 
functionality to the sync file IOCTLs?

IIRC we currently only support adding new fences to the sync file and 
then waiting for all of the in the CS/Atomic page flip.

But what if we also allow replacing the fence(s) in the sync file? And 
then additional to that consuming the fence in the CS/Atomic page flip 
IOCTL?

That's trivial to implement and should give us pretty much the same 
semantics as the shared semaphore object in Vulkan.

Christian.

>
> I'll try and think about it a bit harder tomorrow.
>
> Dave.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                       ` <37118a87-28f2-c96d-18dc-a71292ea35d4-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-09  9:43                                         ` zhoucm1
       [not found]                                           ` <58C123A6.70209-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: zhoucm1 @ 2017-03-09  9:43 UTC (permalink / raw)
  To: Christian König, Dave Airlie
  Cc: Mao, David, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Pierre-Loup Griffais



On 2017年03月09日 17:12, Christian König wrote:
> Am 09.03.2017 um 09:15 schrieb Dave Airlie:
>> On 9 March 2017 at 17:38, Christian König <christian.koenig@amd.com> 
>> wrote:
>>>> I do wonder if we need the separate sem signal/wait interface, I think
>>>> we should just add
>>>> semaphore chunks to the CS interface.
>>> Yeah, that's what I've said as well from the very first beginning.
>>>
>>> Another question is if we should really create another 
>>> implementation to
>>> share semaphores between processes.
>>>
>>> In other words putting the current fences inside the semaphore into a
>>> sync_file with the signal_on_any bit set would have pretty much the 
>>> same
>>> effect, except that the resulting object then had the sync_file 
>>> semantics
>>> for adding new fences and can be used in the atomic IOCTLs as well.
>> So the vulkan external semaphore spec has two different type of 
>> semaphore
>> semantics, I'm not sure the sync_file semantics match the first type,
>> only the second.
>
> I haven't completely read that part of the spec yet, but from what I 
> know the first semantics is actually a bit scary and I'm not sure if 
> we want to fully support that.
>
> Especially that you can wait on a semaphore object which is not 
> signaled yet can easily lead to deadlocks and bound resources in the 
> kernel and windowing system.
>
> Imagine that you send a command submission to the kernel with the 
> request to wait for a semaphore object and then never signal that 
> semaphore object. At least for amdgpu the kernel driver would accept 
> that CS and push it into the scheduler. This operation needs memory, 
> so by doing this the application would bind kernel memory without the 
> prospect of releasing it anytime soon.
>
> We could of course try to limit the amounts of waiting CS in the 
> kernel, but then we have the problem of deadlocks again. E.g. the 
> signaling CS wouldn't be accepted by the kernel because we have so 
> many waiters.
>
> Additional to that you can easily build deadlocks in the form CS A 
> depends on CS B and CS B depends on CS A. The exact same problem for 
> Android fences where discussed on the list as well, but the semantic 
> there is especially designed so that you can't build deadlocks with it.
Forbidding to wait un-sginaled sem will be enough for your this concern.

>
>> I think we would still need separate objects to do the first type,
Agreed, the implementation indeed do this.
>> which I want for VR stuff..
>
> Which is perfectly reasonable, sharing the object between processes 
> takes time. So you only want to do this once.
>
> As a possible solution what do you think about adding some new 
> functionality to the sync file IOCTLs?
>
> IIRC we currently only support adding new fences to the sync file and 
> then waiting for all of the in the CS/Atomic page flip.
>
> But what if we also allow replacing the fence(s) in the sync file? And 
> then additional to that consuming the fence in the CS/Atomic page flip 
> IOCTL?
I feel the new sem implementation of what I attached is very good, at 
least a good start, maybe we could discuss from there, not talk in the 
fly, so that any problem we can improve it.

Regards,
David Zhou
>
> That's trivial to implement and should give us pretty much the same 
> semantics as the shared semaphore object in Vulkan.
>
> Christian.
>
>>
>> I'll try and think about it a bit harder tomorrow.
>>
>> Dave.
>
>

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                           ` <58C123A6.70209-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-09 10:31                                             ` Christian König
       [not found]                                               ` <a6a3ea27-aae2-dc69-1ec2-f463d8417712-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2017-03-09 10:31 UTC (permalink / raw)
  To: zhoucm1, Christian König, Dave Airlie
  Cc: amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Pierre-Loup Griffais

Am 09.03.2017 um 10:43 schrieb zhoucm1:
>
>
> On 2017年03月09日 17:12, Christian König wrote:
>> Am 09.03.2017 um 09:15 schrieb Dave Airlie:
>>> On 9 March 2017 at 17:38, Christian König <christian.koenig@amd.com> 
>>> wrote:
>>>>> I do wonder if we need the separate sem signal/wait interface, I 
>>>>> think
>>>>> we should just add
>>>>> semaphore chunks to the CS interface.
>>>> Yeah, that's what I've said as well from the very first beginning.
>>>>
>>>> Another question is if we should really create another 
>>>> implementation to
>>>> share semaphores between processes.
>>>>
>>>> In other words putting the current fences inside the semaphore into a
>>>> sync_file with the signal_on_any bit set would have pretty much the 
>>>> same
>>>> effect, except that the resulting object then had the sync_file 
>>>> semantics
>>>> for adding new fences and can be used in the atomic IOCTLs as well.
>>> So the vulkan external semaphore spec has two different type of 
>>> semaphore
>>> semantics, I'm not sure the sync_file semantics match the first type,
>>> only the second.
>>
>> I haven't completely read that part of the spec yet, but from what I 
>> know the first semantics is actually a bit scary and I'm not sure if 
>> we want to fully support that.
>>
>> Especially that you can wait on a semaphore object which is not 
>> signaled yet can easily lead to deadlocks and bound resources in the 
>> kernel and windowing system.
>>
>> Imagine that you send a command submission to the kernel with the 
>> request to wait for a semaphore object and then never signal that 
>> semaphore object. At least for amdgpu the kernel driver would accept 
>> that CS and push it into the scheduler. This operation needs memory, 
>> so by doing this the application would bind kernel memory without the 
>> prospect of releasing it anytime soon.
>>
>> We could of course try to limit the amounts of waiting CS in the 
>> kernel, but then we have the problem of deadlocks again. E.g. the 
>> signaling CS wouldn't be accepted by the kernel because we have so 
>> many waiters.
>>
>> Additional to that you can easily build deadlocks in the form CS A 
>> depends on CS B and CS B depends on CS A. The exact same problem for 
>> Android fences where discussed on the list as well, but the semantic 
>> there is especially designed so that you can't build deadlocks with it.
> Forbidding to wait un-sginaled sem will be enough for your this concern.

Completely agree, problem here is that this isn't documented like this 
in the Vulkan specification as far as I know.

>
>>
>>> I think we would still need separate objects to do the first type,
> Agreed, the implementation indeed do this.
>>> which I want for VR stuff..
>>
>> Which is perfectly reasonable, sharing the object between processes 
>> takes time. So you only want to do this once.
>>
>> As a possible solution what do you think about adding some new 
>> functionality to the sync file IOCTLs?
>>
>> IIRC we currently only support adding new fences to the sync file and 
>> then waiting for all of the in the CS/Atomic page flip.
>>
>> But what if we also allow replacing the fence(s) in the sync file? 
>> And then additional to that consuming the fence in the CS/Atomic page 
>> flip IOCTL?
> I feel the new sem implementation of what I attached is very good, at 
> least a good start, maybe we could discuss from there, not talk in the 
> fly, so that any problem we can improve it.

We first need to find a consensus if we can't implement that by 
improving the existing synchronization primitives instead of inventing 
new ones.

I'm clearly not favoring any approach just because it is already 
implemented.

See sync files are already upstream and well supported in for example 
the atomic page flipping IOCTLs and adding the extra functionality 
needed to support the semantics of Vulkan semaphores looks rather 
trivial to me.

Regards,
Christian.

>
> Regards,
> David Zhou
>>
>> That's trivial to implement and should give us pretty much the same 
>> semantics as the shared semaphore object in Vulkan.
>>
>> Christian.
>>
>>>
>>> I'll try and think about it a bit harder tomorrow.
>>>
>>> Dave.
>>
>>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                               ` <a6a3ea27-aae2-dc69-1ec2-f463d8417712-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
@ 2017-03-09 23:19                                                 ` Dave Airlie
       [not found]                                                   ` <CAPM=9tyn1gvTW5W3JbbxmzkN3PTwJjmOKCHiiU52ZOqGDJf_6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-03-09 23:19 UTC (permalink / raw)
  To: Christian König, dri-devel
  Cc: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Christian König, Pierre-Loup Griffais

> Completely agree, problem here is that this isn't documented like this in
> the Vulkan specification as far as I know.

(I'm adding dri-devel, since I think Intel folks have looked into some
of this already,
and we might need to make some common functionality).

"The semaphore must be signaled, or have an associated semaphore
signal operation that is
pending execution."

So I'll try and summarise the semantics of semaphores vs current fence fds.

For shared semaphores there are two defined semantics: temporary and permanent,
I think we would need to support both. temporary align with fence fd,
but permanent not so much.

The main difference I see if that fence's are a one shot thing, you
create a fence when you submit
it and then you give it to someone else to wait on it.

Semaphores are a create once, share once, use multiple times.

The semantics for permanaent semaphore sharing is:

process  A            B
        allocate
        export
                      import
        signal
                      wait
        signal
                      wait

and so on.

The way we currently to semaphores is to insert a fence into the
semaphore on signal, and
block waiting for that fence on wait, then insert a new one on the
next signal. This means
we don't want to constantly reshared the fence_fd. (The temporary
semaphores sharing semantics
match this behaviour).

This leaves me to believe that fence fd's can't be used for this task
as-is. Now the question is if we can
extend them, and how we do that in a useful and backwards compatible manner.

How would we do this, allow dma_fence to be "updated" from another
dma_fence, so we have some sort
of dma_fence variant that has a permanent lifetime, that we can on
signal update from another fence
to match it's behaviour, then on wait works on the updated info. Do we
just want a wrapper around a fence
then, which is pretty much what the proposed sem code is. or do we
want some way to link a bunch
of fences together? What we don't want is to expose to userspace
anything that requires us to reshare the
fence via the fd again after the initial setup.

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                                   ` <CAPM=9tyn1gvTW5W3JbbxmzkN3PTwJjmOKCHiiU52ZOqGDJf_6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-10  0:46                                                     ` Christian König
       [not found]                                                       ` <85322d42-e585-659d-6f98-fc5baf0d6b14-5C7GfCeVMHo@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Christian König @ 2017-03-10  0:46 UTC (permalink / raw)
  To: Dave Airlie, Christian König, dri-devel
  Cc: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Andres Rodriguez, Dave Airlie, Cui, Flora,
	Pierre-Loup Griffais

Am 10.03.2017 um 00:19 schrieb Dave Airlie:
>> Completely agree, problem here is that this isn't documented like this in
>> the Vulkan specification as far as I know.
> (I'm adding dri-devel, since I think Intel folks have looked into some
> of this already,
> and we might need to make some common functionality).
>
> "The semaphore must be signaled, or have an associated semaphore
> signal operation that is
> pending execution."
>
> So I'll try and summarise the semantics of semaphores vs current fence fds.
>
> For shared semaphores there are two defined semantics: temporary and permanent,
> I think we would need to support both. temporary align with fence fd,
> but permanent not so much.
>
> The main difference I see if that fence's are a one shot thing, you
> create a fence when you submit
> it and then you give it to someone else to wait on it.
>
> Semaphores are a create once, share once, use multiple times.
>
> The semantics for permanaent semaphore sharing is:
>
> process  A            B
>          allocate
>          export
>                        import
>          signal
>                        wait
>          signal
>                        wait
>
> and so on.
>
> The way we currently to semaphores is to insert a fence into the
> semaphore on signal, and
> block waiting for that fence on wait, then insert a new one on the
> next signal. This means
> we don't want to constantly reshared the fence_fd. (The temporary
> semaphores sharing semantics
> match this behaviour).
>
> This leaves me to believe that fence fd's can't be used for this task
> as-is. Now the question is if we can
> extend them, and how we do that in a useful and backwards compatible manner.

As far as I can see the only functionality we are missing here is:

void sync_file_signal(struct sync_file *sync_file, struct dma_fence *fence)
{
     dma_fence_put(sync_file->fence);
     sync_file->fence = fence;
}

We probably should do this atomically, but that is only a matter of 
taking locks/atomic pointer operation.

The waiting is done using the normal sync_file_get_fence() function.

The rest is David's patch to import/export the fd handle into a local 
idr based handle.

>
> How would we do this, allow dma_fence to be "updated" from another
> dma_fence, so we have some sort
> of dma_fence variant that has a permanent lifetime, that we can on
> signal update from another fence
> to match it's behaviour, then on wait works on the updated info. Do we
> just want a wrapper around a fence
> then, which is pretty much what the proposed sem code is. or do we
> want some way to link a bunch
> of fences together? What we don't want is to expose to userspace
> anything that requires us to reshare the
> fence via the fd again after the initial setup.

We don't need to reshare the fd or change anything on the dma_fence 
implementation, just using the sync_file as the base container should be 
enough.

Christian.

>
> Dave.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                                       ` <85322d42-e585-659d-6f98-fc5baf0d6b14-5C7GfCeVMHo@public.gmane.org>
@ 2017-03-10  3:25                                                         ` Dave Airlie
       [not found]                                                           ` <CAPM=9tz+3DB8zZTH1kRUt8KE-wU7RNeoOE6zQWn_dWDvVRSBMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-03-10  3:25 UTC (permalink / raw)
  To: Christian König
  Cc: zhoucm1, Mao, David, dri-devel, Andres Rodriguez,
	Christian König, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW,
	Dave Airlie, Cui, Flora, Pierre-Loup Griffais, Andres Rodriguez

>
> As far as I can see the only functionality we are missing here is:
>
> void sync_file_signal(struct sync_file *sync_file, struct dma_fence *fence)
> {
>     dma_fence_put(sync_file->fence);
>     sync_file->fence = fence;
> }
>
> We probably should do this atomically, but that is only a matter of taking
> locks/atomic pointer operation.
>
> The waiting is done using the normal sync_file_get_fence() function.
>
> The rest is David's patch to import/export the fd handle into a local idr
> based handle.

Are you suggesting we start keeping track of sync_file objects in a local idr?

As currently they are only tracked as files, which is probably not what we want
for every unshared semaphore, or are you thinking more that the amdgpu local
sem should be just storing a sync_file pointer, rather than what it does now.

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                                           ` <CAPM=9tz+3DB8zZTH1kRUt8KE-wU7RNeoOE6zQWn_dWDvVRSBMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-10  4:27                                                             ` Dave Airlie
       [not found]                                                               ` <CAPM=9tyDjtgzYEo7FET6jY9zZrWEd1LoLGWLNRkbbRnWTsnzkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-03-10  4:27 UTC (permalink / raw)
  To: Christian König
  Cc: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Christian König, Andres Rodriguez,
	Dave Airlie, Cui, Flora, Pierre-Loup Griffais

[-- Attachment #1: Type: text/plain, Size: 1066 bytes --]

On 10 March 2017 at 13:25, Dave Airlie <airlied-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>
>> As far as I can see the only functionality we are missing here is:
>>
>> void sync_file_signal(struct sync_file *sync_file, struct dma_fence *fence)
>> {
>>     dma_fence_put(sync_file->fence);
>>     sync_file->fence = fence;
>> }
>>
>> We probably should do this atomically, but that is only a matter of taking
>> locks/atomic pointer operation.
>>
>> The waiting is done using the normal sync_file_get_fence() function.
>>
>> The rest is David's patch to import/export the fd handle into a local idr
>> based handle.
>
> Are you suggesting we start keeping track of sync_file objects in a local idr?
>
> As currently they are only tracked as files, which is probably not what we want
> for every unshared semaphore, or are you thinking more that the amdgpu local
> sem should be just storing a sync_file pointer, rather than what it does now.

Okay here's a first pass at what I think you mean, it's missing
things, but the idea
should be what you said.

Dave.

[-- Attachment #2: 0001-RFC-drm-amdgpu-add-shared-semaphores-support.-sync_f.patch --]
[-- Type: text/x-patch, Size: 16954 bytes --]

From e1de0c7135a1355bf4bc704ee58b6fc930ebaccc Mon Sep 17 00:00:00 2001
From: Dave Airlie <airlied@redhat.com>
Date: Wed, 8 Mar 2017 03:42:45 +0000
Subject: [PATCH] [RFC] drm/amdgpu: add shared semaphores support. (sync_file)

sync_file based

This is based on code provided by AMD (Chunming Zhou).

I've changed the code so the semaphore waits/signals are
passed in cs chunks rather than via separate ioctls.

This code is finished, I just wanted to ship it out early
in case anyone spots a problem, I'll finish off the libdrm
and radv bits and send more later.
---
 drivers/dma-buf/sync_file.c             |   3 +-
 drivers/gpu/drm/amd/amdgpu/Makefile     |   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h     |  13 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  70 ++++++++++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c | 213 ++++++++++++++++++++++++++++++++
 include/linux/sync_file.h               |   2 +-
 include/uapi/drm/amdgpu_drm.h           |  28 +++++
 9 files changed, 337 insertions(+), 5 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c

diff --git a/drivers/dma-buf/sync_file.c b/drivers/dma-buf/sync_file.c
index 2321035..e3e43f6 100644
--- a/drivers/dma-buf/sync_file.c
+++ b/drivers/dma-buf/sync_file.c
@@ -91,7 +91,7 @@ struct sync_file *sync_file_create(struct dma_fence *fence)
 }
 EXPORT_SYMBOL(sync_file_create);
 
-static struct sync_file *sync_file_fdget(int fd)
+struct sync_file *sync_file_fdget(int fd)
 {
 	struct file *file = fget(fd);
 
@@ -107,6 +107,7 @@ static struct sync_file *sync_file_fdget(int fd)
 	fput(file);
 	return NULL;
 }
+EXPORT_SYMBOL(sync_file_fdget);
 
 /**
  * sync_file_get_fence - get the fence related to the sync_file fd
diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile b/drivers/gpu/drm/amd/amdgpu/Makefile
index 2814aad..404bcba 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -24,7 +24,7 @@ amdgpu-y += amdgpu_device.o amdgpu_kms.o \
 	atombios_encoders.o amdgpu_sa.o atombios_i2c.o \
 	amdgpu_prime.o amdgpu_vm.o amdgpu_ib.o amdgpu_pll.o \
 	amdgpu_ucode.o amdgpu_bo_list.o amdgpu_ctx.o amdgpu_sync.o \
-	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o
+	amdgpu_gtt_mgr.o amdgpu_vram_mgr.o amdgpu_virt.o amdgpu_sem.o
 
 # add asic specific block
 amdgpu-$(CONFIG_DRM_AMDGPU_CIK)+= cik.o cik_ih.o kv_smc.o kv_dpm.o \
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index c1b9135..84bbc57 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -53,6 +53,7 @@
 #include "amdgpu_ucode.h"
 #include "amdgpu_ttm.h"
 #include "amdgpu_gds.h"
+#include "amdgpu_sem.h"
 #include "amdgpu_sync.h"
 #include "amdgpu_ring.h"
 #include "amdgpu_vm.h"
@@ -702,6 +703,8 @@ struct amdgpu_fpriv {
 	struct mutex		bo_list_lock;
 	struct idr		bo_list_handles;
 	struct amdgpu_ctx_mgr	ctx_mgr;
+	spinlock_t		sem_handles_lock;
+	struct idr		sem_handles;
 };
 
 /*
@@ -1814,5 +1817,15 @@ amdgpu_cs_find_mapping(struct amdgpu_cs_parser *parser,
 		       uint64_t addr, struct amdgpu_bo **bo);
 int amdgpu_cs_sysvm_access_required(struct amdgpu_cs_parser *parser);
 
+int amdgpu_sem_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *filp);
+void amdgpu_sem_destroy(struct amdgpu_fpriv *fpriv, u32 handle);
+int amdgpu_sem_lookup_and_signal(struct amdgpu_fpriv *fpriv,
+				 uint32_t handle,
+				 struct dma_fence *fence);
+int amdgpu_sem_lookup_and_sync(struct amdgpu_device *adev,
+			       struct amdgpu_fpriv *fpriv,
+			       struct amdgpu_sync *sync,
+			       uint32_t handle);
 #include "amdgpu_object.h"
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 4671432..80fc94b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -217,6 +217,8 @@ int amdgpu_cs_parser_init(struct amdgpu_cs_parser *p, void *data)
 			break;
 
 		case AMDGPU_CHUNK_ID_DEPENDENCIES:
+		case AMDGPU_CHUNK_ID_SEM_WAIT:
+		case AMDGPU_CHUNK_ID_SEM_SIGNAL:
 			break;
 
 		default:
@@ -1009,6 +1011,28 @@ static int amdgpu_process_fence_dep(struct amdgpu_device *adev,
 	return 0;
 }
 
+static int amdgpu_process_sem_wait_dep(struct amdgpu_device *adev,
+				       struct amdgpu_cs_parser *p,
+				       struct amdgpu_cs_chunk *chunk)
+{
+	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+	unsigned num_deps;
+	int i, r;
+	struct drm_amdgpu_cs_chunk_sem *deps;
+
+	deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+	num_deps = chunk->length_dw * 4 /
+		sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+	for (i = 0; i < num_deps; ++i) {
+		r = amdgpu_sem_lookup_and_sync(adev, fpriv, &p->job->sync,
+					       deps[i].handle);
+		if (r)
+			return r;
+	}
+	return 0;
+}
+
 static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
 				  struct amdgpu_cs_parser *p)
 {
@@ -1023,12 +1047,56 @@ static int amdgpu_cs_dependencies(struct amdgpu_device *adev,
 			r = amdgpu_process_fence_dep(adev, p, chunk);
 			if (r)
 				return r;
+		} else if (chunk->chunk_id == AMDGPU_CHUNK_ID_SEM_WAIT) {
+			r = amdgpu_process_sem_wait_dep(adev, p, chunk);
+			if (r)
+				return r;
 		}
 	}
 
 	return 0;
 }
 
+static int amdgpu_process_sem_signal_dep(struct amdgpu_cs_parser *p,
+					 struct amdgpu_cs_chunk *chunk,
+					 struct dma_fence *fence)
+{
+	struct amdgpu_fpriv *fpriv = p->filp->driver_priv;
+	unsigned num_deps;
+	int i, r;
+	struct drm_amdgpu_cs_chunk_sem *deps;
+
+	deps = (struct drm_amdgpu_cs_chunk_sem *)chunk->kdata;
+	num_deps = chunk->length_dw * 4 /
+		sizeof(struct drm_amdgpu_cs_chunk_sem);
+
+	for (i = 0; i < num_deps; ++i) {
+		r = amdgpu_sem_lookup_and_signal(fpriv, deps[i].handle,
+						 fence);
+		if (r)
+			return r;
+	}
+	return 0;
+}
+
+static int amdgpu_cs_post_dependencies(struct amdgpu_cs_parser *p)
+{
+	int i, r;
+
+	for (i = 0; i < p->nchunks; ++i) {
+		struct amdgpu_cs_chunk *chunk;
+
+		chunk = &p->chunks[i];
+
+		if (chunk->chunk_id == AMDGPU_CHUNK_ID_SEM_SIGNAL) {
+			r = amdgpu_process_sem_signal_dep(p, chunk, p->fence);
+			if (r)
+				return r;
+		}
+	}
+	return 0;
+}
+
 static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 			    union drm_amdgpu_cs *cs)
 {
@@ -1056,7 +1124,7 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
 	trace_amdgpu_cs_ioctl(job);
 	amd_sched_entity_push_job(&job->base);
 
-	return 0;
+	return amdgpu_cs_post_dependencies(p);
 }
 
 int amdgpu_cs_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
index cf05006..67a4157 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c
@@ -86,6 +86,7 @@ static void amdgpu_ctx_fini(struct amdgpu_ctx *ctx)
 	for (i = 0; i < adev->num_rings; i++)
 		amd_sched_entity_fini(&adev->rings[i]->sched,
 				      &ctx->rings[i].entity);
+
 }
 
 static int amdgpu_ctx_alloc(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 61d94c7..70a398a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -664,6 +664,8 @@ int amdgpu_driver_open_kms(struct drm_device *dev, struct drm_file *file_priv)
 	mutex_init(&fpriv->bo_list_lock);
 	idr_init(&fpriv->bo_list_handles);
 
+	spin_lock_init(&fpriv->sem_handles_lock);
+	idr_init(&fpriv->sem_handles);
 	amdgpu_ctx_mgr_init(&fpriv->ctx_mgr);
 
 	file_priv->driver_priv = fpriv;
@@ -689,6 +691,7 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 	struct amdgpu_device *adev = dev->dev_private;
 	struct amdgpu_fpriv *fpriv = file_priv->driver_priv;
 	struct amdgpu_bo_list *list;
+	struct amdgpu_sem *sem;
 	int handle;
 
 	if (!fpriv)
@@ -711,10 +714,14 @@ void amdgpu_driver_postclose_kms(struct drm_device *dev,
 
 	idr_for_each_entry(&fpriv->bo_list_handles, list, handle)
 		amdgpu_bo_list_free(list);
-
+	
 	idr_destroy(&fpriv->bo_list_handles);
 	mutex_destroy(&fpriv->bo_list_lock);
 
+	idr_for_each_entry(&fpriv->sem_handles, sem, handle)
+		amdgpu_sem_destroy(fpriv, handle);
+	idr_destroy(&fpriv->sem_handles);
+
 	kfree(fpriv);
 	file_priv->driver_priv = NULL;
 
@@ -896,6 +903,7 @@ const struct drm_ioctl_desc amdgpu_ioctls_kms[] = {
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_VA, amdgpu_gem_va_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_OP, amdgpu_gem_op_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(AMDGPU_GEM_USERPTR, amdgpu_gem_userptr_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(AMDGPU_SEM, amdgpu_sem_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
 };
 const int amdgpu_max_kms_ioctl = ARRAY_SIZE(amdgpu_ioctls_kms);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
new file mode 100644
index 0000000..211cccf
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sem.c
@@ -0,0 +1,213 @@
+/*
+ * Copyright 2016 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Chunming Zhou <david1.zhou@amd.com>
+ */
+#include <linux/file.h>
+#include <linux/fs.h>
+#include <linux/kernel.h>
+#include <linux/poll.h>
+#include <linux/seq_file.h>
+#include <linux/export.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
+#include <linux/anon_inodes.h>
+#include <linux/sync_file.h>
+#include "amdgpu.h"
+#include <drm/drmP.h>
+
+static inline struct sync_file *amdgpu_sync_file_lookup(struct amdgpu_fpriv *fpriv, u32 handle)
+{
+	struct sync_file *sync_file;
+
+	spin_lock(&fpriv->sem_handles_lock);
+
+	/* Check if we currently have a reference on the object */
+	sync_file = idr_find(&fpriv->sem_handles, handle);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+
+	return sync_file;
+}
+
+static int amdgpu_sem_create(struct amdgpu_fpriv *fpriv, u32 *handle)
+{
+	struct sync_file *sync_file = sync_file_create(NULL);
+	int ret;
+
+	if (!sync_file)
+		return -ENOMEM;
+	
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fpriv->sem_handles_lock);
+
+	ret = idr_alloc(&fpriv->sem_handles, sync_file, 1, 0, GFP_NOWAIT);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+	idr_preload_end();
+
+	if (ret < 0)
+		return ret;
+
+	*handle = ret;
+	return 0;
+}
+
+static int amdgpu_sem_signal(struct amdgpu_fpriv *fpriv,
+			     u32 handle, struct dma_fence *fence)
+{
+	struct sync_file *sync_file;
+
+	sync_file = amdgpu_sync_file_lookup(fpriv, handle);
+	if (!sync_file)
+		return -EINVAL;
+
+	//TODO locking
+	dma_fence_put(sync_file->fence);
+	sync_file->fence = dma_fence_get(fence);
+
+	return 0;
+}
+
+static int amdgpu_sem_import(struct amdgpu_fpriv *fpriv,
+				       int fd, u32 *handle)
+{
+	struct sync_file *sync_file = sync_file_fdget(fd);
+	int ret;
+
+	if (!sync_file)
+		return -EINVAL;
+
+	idr_preload(GFP_KERNEL);
+	spin_lock(&fpriv->sem_handles_lock);
+
+	ret = idr_alloc(&fpriv->sem_handles, sync_file, 1, 0, GFP_NOWAIT);
+
+	spin_unlock(&fpriv->sem_handles_lock);
+	idr_preload_end();
+
+	if (ret < 0)
+		goto err_out;
+
+	*handle = ret;
+	return 0;
+err_out:
+	return ret;
+
+}
+
+static int amdgpu_sem_export(struct amdgpu_fpriv *fpriv,
+			     u32 handle, int *fd)
+{
+	struct sync_file *sync_file;
+	int ret;
+
+	sync_file = amdgpu_sync_file_lookup(fpriv, handle);
+	if (!sync_file)
+		return -EINVAL;
+
+	ret = get_unused_fd_flags(O_CLOEXEC);
+	if (ret < 0)
+		goto err_put_file;
+
+	fd_install(ret, sync_file->file);
+
+	*fd = ret;
+	return 0;
+err_put_file:
+	return ret;
+}
+
+void amdgpu_sem_destroy(struct amdgpu_fpriv *fpriv, u32 handle)
+{
+	struct sync_file *sync_file = amdgpu_sync_file_lookup(fpriv, handle);
+	if (!sync_file)
+		return;
+
+	spin_lock(&fpriv->sem_handles_lock);
+	idr_remove(&fpriv->sem_handles, handle);
+	spin_unlock(&fpriv->sem_handles_lock);
+
+	// free sync file
+//	kref_put(&sem->kref, amdgpu_sem_free);
+}
+
+int amdgpu_sem_lookup_and_sync(struct amdgpu_device *adev,
+			       struct amdgpu_fpriv *fpriv,
+			       struct amdgpu_sync *sync,
+			       uint32_t handle)
+{
+	int r;
+	struct sync_file *sync_file;
+
+	sync_file = amdgpu_sync_file_lookup(fpriv, handle);
+	if (!sync_file)
+		return -EINVAL;
+
+	r = amdgpu_sync_fence(adev, sync, sync_file->fence);
+	if (r)
+		goto err;
+
+	//TODO locking
+	dma_fence_put(sync_file->fence);
+	sync_file->fence = NULL;
+
+err:
+	return r;
+
+}
+
+int amdgpu_sem_lookup_and_signal(struct amdgpu_fpriv *fpriv,
+				 uint32_t handle,
+				 struct dma_fence *fence)
+{
+	return amdgpu_sem_signal(fpriv, handle, fence);
+}
+
+int amdgpu_sem_ioctl(struct drm_device *dev, void *data,
+		     struct drm_file *filp)
+{
+	union drm_amdgpu_sem *args = data;
+	struct amdgpu_fpriv *fpriv = filp->driver_priv;
+	int r = 0;
+
+	switch (args->in.op) {
+	case AMDGPU_SEM_OP_CREATE_SEM:
+		r = amdgpu_sem_create(fpriv, &args->out.handle);
+		break;
+	case AMDGPU_SEM_OP_IMPORT_SEM:
+		r = amdgpu_sem_import(fpriv, args->in.handle, &args->out.handle);
+		break;
+	case AMDGPU_SEM_OP_EXPORT_SEM:
+		r = amdgpu_sem_export(fpriv, args->in.handle, &args->out.fd);
+		break;
+	case AMDGPU_SEM_OP_DESTROY_SEM:
+		amdgpu_sem_destroy(fpriv, args->in.handle);
+		break;
+	default:
+		r = -EINVAL;
+		break;
+	}
+
+	return r;
+}
diff --git a/include/linux/sync_file.h b/include/linux/sync_file.h
index 3e3ab84..56bf07e 100644
--- a/include/linux/sync_file.h
+++ b/include/linux/sync_file.h
@@ -49,5 +49,5 @@ struct sync_file {
 
 struct sync_file *sync_file_create(struct dma_fence *fence);
 struct dma_fence *sync_file_get_fence(int fd);
-
+struct sync_file *sync_file_fdget(int fd);
 #endif /* _LINUX_SYNC_H */
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 5797283..646b103 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -51,6 +51,7 @@ extern "C" {
 #define DRM_AMDGPU_GEM_OP		0x10
 #define DRM_AMDGPU_GEM_USERPTR		0x11
 #define DRM_AMDGPU_WAIT_FENCES		0x12
+#define DRM_AMDGPU_SEM                  0x13
 
 #define DRM_IOCTL_AMDGPU_GEM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_CREATE, union drm_amdgpu_gem_create)
 #define DRM_IOCTL_AMDGPU_GEM_MMAP	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_MMAP, union drm_amdgpu_gem_mmap)
@@ -65,6 +66,7 @@ extern "C" {
 #define DRM_IOCTL_AMDGPU_GEM_OP		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_OP, struct drm_amdgpu_gem_op)
 #define DRM_IOCTL_AMDGPU_GEM_USERPTR	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_GEM_USERPTR, struct drm_amdgpu_gem_userptr)
 #define DRM_IOCTL_AMDGPU_WAIT_FENCES	DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_WAIT_FENCES, union drm_amdgpu_wait_fences)
+#define DRM_IOCTL_AMDGPU_SEM		DRM_IOWR(DRM_COMMAND_BASE + DRM_AMDGPU_SEM, union drm_amdgpu_sem)
 
 #define AMDGPU_GEM_DOMAIN_CPU		0x1
 #define AMDGPU_GEM_DOMAIN_GTT		0x2
@@ -335,6 +337,26 @@ union drm_amdgpu_wait_fences {
 	struct drm_amdgpu_wait_fences_out out;
 };
 
+#define AMDGPU_SEM_OP_CREATE_SEM 0
+#define AMDGPU_SEM_OP_IMPORT_SEM 1
+#define AMDGPU_SEM_OP_EXPORT_SEM 2
+#define AMDGPU_SEM_OP_DESTROY_SEM 3
+
+struct drm_amdgpu_sem_in {
+	__u32 op;
+	__u32 handle;
+};
+
+struct drm_amdgpu_sem_out {
+	__u32 fd;
+	__u32 handle;
+};
+
+union drm_amdgpu_sem {
+	struct drm_amdgpu_sem_in in;
+	struct drm_amdgpu_sem_out out;
+};
+
 #define AMDGPU_GEM_OP_GET_GEM_CREATE_INFO	0
 #define AMDGPU_GEM_OP_SET_PLACEMENT		1
 
@@ -390,6 +412,8 @@ struct drm_amdgpu_gem_va {
 #define AMDGPU_CHUNK_ID_IB		0x01
 #define AMDGPU_CHUNK_ID_FENCE		0x02
 #define AMDGPU_CHUNK_ID_DEPENDENCIES	0x03
+#define AMDGPU_CHUNK_ID_SEM_WAIT        0x04
+#define AMDGPU_CHUNK_ID_SEM_SIGNAL      0x05
 
 struct drm_amdgpu_cs_chunk {
 	__u32		chunk_id;
@@ -454,6 +478,10 @@ struct drm_amdgpu_cs_chunk_fence {
 	__u32 offset;
 };
 
+struct drm_amdgpu_cs_chunk_sem {
+	__u32 handle;
+};
+
 struct drm_amdgpu_cs_chunk_data {
 	union {
 		struct drm_amdgpu_cs_chunk_ib		ib_data;
-- 
2.7.4


[-- Attachment #3: Type: text/plain, Size: 154 bytes --]

_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                                               ` <CAPM=9tyDjtgzYEo7FET6jY9zZrWEd1LoLGWLNRkbbRnWTsnzkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-10  4:43                                                                 ` Dave Airlie
       [not found]                                                                   ` <CAPM=9tzoqKE=F+JjWwUXv7cGTxwJ1u15jTNuTxqL_jSQwQMRAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 19+ messages in thread
From: Dave Airlie @ 2017-03-10  4:43 UTC (permalink / raw)
  To: Christian König
  Cc: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Christian König, Andres Rodriguez,
	Dave Airlie, Cui, Flora, Pierre-Loup Griffais

On 10 March 2017 at 14:27, Dave Airlie <airlied@gmail.com> wrote:
> On 10 March 2017 at 13:25, Dave Airlie <airlied@gmail.com> wrote:
>>>
>>> As far as I can see the only functionality we are missing here is:
>>>
>>> void sync_file_signal(struct sync_file *sync_file, struct dma_fence *fence)
>>> {
>>>     dma_fence_put(sync_file->fence);
>>>     sync_file->fence = fence;
>>> }
>>>
>>> We probably should do this atomically, but that is only a matter of taking
>>> locks/atomic pointer operation.
>>>
>>> The waiting is done using the normal sync_file_get_fence() function.
>>>
>>> The rest is David's patch to import/export the fd handle into a local idr
>>> based handle.
>>
>> Are you suggesting we start keeping track of sync_file objects in a local idr?
>>
>> As currently they are only tracked as files, which is probably not what we want
>> for every unshared semaphore, or are you thinking more that the amdgpu local
>> sem should be just storing a sync_file pointer, rather than what it does now.
>
> Okay here's a first pass at what I think you mean, it's missing
> things, but the idea
> should be what you said.

(This version oopses of course due to NULL into sync_file_create,
but that should be tirival to fix next week,)

Dave.
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Shared semaphores for amdgpu
       [not found]                                                                   ` <CAPM=9tzoqKE=F+JjWwUXv7cGTxwJ1u15jTNuTxqL_jSQwQMRAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2017-03-10  9:12                                                                     ` Christian König
  0 siblings, 0 replies; 19+ messages in thread
From: Christian König @ 2017-03-10  9:12 UTC (permalink / raw)
  To: Dave Airlie
  Cc: zhoucm1, amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW, Mao, David,
	Andres Rodriguez, Christian König, Andres Rodriguez,
	Dave Airlie, Cui, Flora, Pierre-Loup Griffais

Am 10.03.2017 um 05:43 schrieb Dave Airlie:
> On 10 March 2017 at 14:27, Dave Airlie <airlied@gmail.com> wrote:
>> On 10 March 2017 at 13:25, Dave Airlie <airlied@gmail.com> wrote:
>>>> As far as I can see the only functionality we are missing here is:
>>>>
>>>> void sync_file_signal(struct sync_file *sync_file, struct dma_fence *fence)
>>>> {
>>>>      dma_fence_put(sync_file->fence);
>>>>      sync_file->fence = fence;
>>>> }
>>>>
>>>> We probably should do this atomically, but that is only a matter of taking
>>>> locks/atomic pointer operation.
>>>>
>>>> The waiting is done using the normal sync_file_get_fence() function.
>>>>
>>>> The rest is David's patch to import/export the fd handle into a local idr
>>>> based handle.
>>> Are you suggesting we start keeping track of sync_file objects in a local idr?

Yes, exactly.

>>> As currently they are only tracked as files, which is probably not what we want
>>> for every unshared semaphore, or are you thinking more that the amdgpu local
>>> sem should be just storing a sync_file pointer, rather than what it does now.
>> Okay here's a first pass at what I think you mean, it's missing
>> things, but the idea
>> should be what you said.

Yeah, that's pretty much what I had in mind.

If we can find consensus with the Intel guys on this we might want to 
move parts of the idr handling stuff into common code.

And I would give it another name, something like sync_handle or similar. 
As far as I can see it's just another representation of sync_file 
structures to userspace.

> (This version oopses of course due to NULL into sync_file_create,
> but that should be tirival to fix next week,)

Yeah, we of course need to figure out all the implementation details. 
But I think that won't be much trouble.

Christian.

>
> Dave.


_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-03-10  9:12 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-05  4:09 Shared semaphores for amdgpu Andres Rodriguez
     [not found] ` <544E607D03B20249AA404517E498FC469A558B-Lp/cVzEoVyaisxZYEgh0i620KmCxYQEWVpNB7YpNyf8@public.gmane.org>
2017-01-05  4:13   ` Mao, David
     [not found]     ` <BN4PR12MB0787AE3A185BE4D6916CE42AEE600-aH9FTdWx9BancvD3hK8fMAdYzm3356FpvxpqHgZTriW3zl9H0oFU5g@public.gmane.org>
2017-01-05 17:48       ` Andres Rodriguez
     [not found]         ` <a25fdfeb-be5d-2d23-d7b1-ef14891ba6d5-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2017-02-27 19:36           ` Dave Airlie
2017-02-28  1:46             ` zhoucm1
     [not found]               ` <58B4D68E.5080606-5C7GfCeVMHo@public.gmane.org>
2017-03-09  3:52                 ` Dave Airlie
     [not found]                   ` <CAPM=9tw3nbGe+gaOpoBZeXfmS5+C3R4eK=uT8AL3krL0PMR0LA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-09  4:24                     ` Dave Airlie
     [not found]                       ` <CAPM=9twMvVoKCCmQUVsB6uD18j1e9cNq9eNqviVFy6F8v7OdOg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-09  7:00                         ` zhoucm1
     [not found]                           ` <58C0FD86.8040808-5C7GfCeVMHo@public.gmane.org>
2017-03-09  7:38                             ` Christian König
     [not found]                               ` <d5a5f1ba-4bcb-f374-f18e-b060cc40aa9e-5C7GfCeVMHo@public.gmane.org>
2017-03-09  8:15                                 ` Dave Airlie
     [not found]                                   ` <CAPM=9twfZhb7vt_gEBE6LaUfthseX_PC_BZctCyu520MK32QCQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-09  9:12                                     ` Christian König
     [not found]                                       ` <37118a87-28f2-c96d-18dc-a71292ea35d4-5C7GfCeVMHo@public.gmane.org>
2017-03-09  9:43                                         ` zhoucm1
     [not found]                                           ` <58C123A6.70209-5C7GfCeVMHo@public.gmane.org>
2017-03-09 10:31                                             ` Christian König
     [not found]                                               ` <a6a3ea27-aae2-dc69-1ec2-f463d8417712-ANTagKRnAhcb1SvskN2V4Q@public.gmane.org>
2017-03-09 23:19                                                 ` Dave Airlie
     [not found]                                                   ` <CAPM=9tyn1gvTW5W3JbbxmzkN3PTwJjmOKCHiiU52ZOqGDJf_6A-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-10  0:46                                                     ` Christian König
     [not found]                                                       ` <85322d42-e585-659d-6f98-fc5baf0d6b14-5C7GfCeVMHo@public.gmane.org>
2017-03-10  3:25                                                         ` Dave Airlie
     [not found]                                                           ` <CAPM=9tz+3DB8zZTH1kRUt8KE-wU7RNeoOE6zQWn_dWDvVRSBMA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-10  4:27                                                             ` Dave Airlie
     [not found]                                                               ` <CAPM=9tyDjtgzYEo7FET6jY9zZrWEd1LoLGWLNRkbbRnWTsnzkw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-10  4:43                                                                 ` Dave Airlie
     [not found]                                                                   ` <CAPM=9tzoqKE=F+JjWwUXv7cGTxwJ1u15jTNuTxqL_jSQwQMRAQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2017-03-10  9:12                                                                     ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.