All of lore.kernel.org
 help / color / mirror / Atom feed
* [PULL REQUEST] ttm fence conversion
@ 2014-09-01 11:34 Maarten Lankhorst
  2014-09-01 12:31 ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Maarten Lankhorst @ 2014-09-01 11:34 UTC (permalink / raw)
  To: Dave Airlie; +Cc: dri-devel

The following changes since commit 04cd214516d8a6f0f8c0116185d6e360df0860d2:

  Merge branch 'drm-next-3.18' of git://people.freedesktop.org/~agd5f/linux into drm-next (2014-08-28 13:45:45 +1000)

are available in the git repository at:

  ssh://people.freedesktop.org/~mlankhorst/linux for-airlied-next

for you to fetch changes up to d591829ffd785ee27c82becc67673ce70b21cb83:

  drm/nouveau: use shared fences for readable objects (2014-09-01 11:09:39 +0200)

\o/ radeon gpu reset rework is in -next, time for fences!

----------------------------------------------------------------
Maarten Lankhorst (18):
      drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
      drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence
      drm/ttm: call ttm_bo_wait while inside a reservation
      drm/ttm: kill fence_lock
      drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
      drm/ttm: kill off some members to ttm_validate_buffer
      drm/radeon: use common fence implementation for fences, v3
      drm/vmwgfx: get rid of different types of fence_flags entirely
      drm/vmwgfx: rework to new fence interface, v2
      drm/nouveau: rework to new fence interface
      drm/qxl: rework to new fence interface
      drm/ttm: flip the switch, and convert to dma_fence
      drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
      drm/radeon: use rcu waits in some ioctls
      drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
      drm/ttm: use rcu in core ttm
      drm/nouveau: Keep only a single list for validation.
      drm/nouveau: use shared fences for readable objects

 drivers/gpu/drm/nouveau/nouveau_bo.c      |  64 +---
 drivers/gpu/drm/nouveau/nouveau_bo.h      |   2 +-
 drivers/gpu/drm/nouveau/nouveau_display.c |  27 +-
 drivers/gpu/drm/nouveau/nouveau_fence.c   | 518 ++++++++++++++++++++----------
 drivers/gpu/drm/nouveau/nouveau_fence.h   |  26 +-
 drivers/gpu/drm/nouveau/nouveau_gem.c     | 178 +++++-----
 drivers/gpu/drm/nouveau/nv04_fence.c      |   6 +-
 drivers/gpu/drm/nouveau/nv10_fence.c      |   6 +-
 drivers/gpu/drm/nouveau/nv17_fence.c      |   4 +-
 drivers/gpu/drm/nouveau/nv50_fence.c      |   4 +-
 drivers/gpu/drm/nouveau/nv84_fence.c      |  22 +-
 drivers/gpu/drm/qxl/Makefile              |   2 +-
 drivers/gpu/drm/qxl/qxl_cmd.c             |   7 -
 drivers/gpu/drm/qxl/qxl_debugfs.c         |  16 +-
 drivers/gpu/drm/qxl/qxl_drv.h             |  20 +-
 drivers/gpu/drm/qxl/qxl_fence.c           |  91 ------
 drivers/gpu/drm/qxl/qxl_kms.c             |   1 +
 drivers/gpu/drm/qxl/qxl_object.c          |   2 -
 drivers/gpu/drm/qxl/qxl_object.h          |   6 +-
 drivers/gpu/drm/qxl/qxl_release.c         | 172 ++++++++--
 drivers/gpu/drm/qxl/qxl_ttm.c             |  93 ------
 drivers/gpu/drm/radeon/radeon.h           |  15 +-
 drivers/gpu/drm/radeon/radeon_cs.c        |  10 +-
 drivers/gpu/drm/radeon/radeon_device.c    |   6 +-
 drivers/gpu/drm/radeon/radeon_display.c   |   7 +-
 drivers/gpu/drm/radeon/radeon_fence.c     | 256 +++++++++++++--
 drivers/gpu/drm/radeon/radeon_gem.c       |  20 +-
 drivers/gpu/drm/radeon/radeon_irq_kms.c   |  43 +++
 drivers/gpu/drm/radeon/radeon_mn.c        |   6 +-
 drivers/gpu/drm/radeon/radeon_object.c    |   8 +-
 drivers/gpu/drm/radeon/radeon_ttm.c       |  34 +-
 drivers/gpu/drm/radeon/radeon_uvd.c       |   6 +-
 drivers/gpu/drm/radeon/radeon_vm.c        |  17 +-
 drivers/gpu/drm/ttm/ttm_bo.c              | 187 +++++------
 drivers/gpu/drm/ttm/ttm_bo_util.c         |  28 +-
 drivers/gpu/drm/ttm/ttm_bo_vm.c           |   3 -
 drivers/gpu/drm/ttm/ttm_execbuf_util.c    | 146 +++------
 drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c    |  47 ---
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.h       |   2 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c   |  24 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.c     | 346 +++++++++++---------
 drivers/gpu/drm/vmwgfx/vmwgfx_fence.h     |  35 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c      |  11 +-
 drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  43 ++-
 include/drm/ttm/ttm_bo_api.h              |   7 +-
 include/drm/ttm/ttm_bo_driver.h           |  29 +-
 include/drm/ttm/ttm_execbuf_util.h        |  22 +-
 47 files changed, 1396 insertions(+), 1229 deletions(-)
 delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-01 11:34 [PULL REQUEST] ttm fence conversion Maarten Lankhorst
@ 2014-09-01 12:31 ` Christian König
  2014-09-01 13:33   ` Maarten Lankhorst
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2014-09-01 12:31 UTC (permalink / raw)
  To: Maarten Lankhorst, Dave Airlie; +Cc: dri-devel

Please wait a second with that.

I didn't had a chance to test this yet and nobody has yet given it's rb 
on at least the radeon changes in this branch.

Christian.

Am 01.09.2014 um 13:34 schrieb Maarten Lankhorst:
> The following changes since commit 04cd214516d8a6f0f8c0116185d6e360df0860d2:
>
>    Merge branch 'drm-next-3.18' of git://people.freedesktop.org/~agd5f/linux into drm-next (2014-08-28 13:45:45 +1000)
>
> are available in the git repository at:
>
>    ssh://people.freedesktop.org/~mlankhorst/linux for-airlied-next
>
> for you to fetch changes up to d591829ffd785ee27c82becc67673ce70b21cb83:
>
>    drm/nouveau: use shared fences for readable objects (2014-09-01 11:09:39 +0200)
>
> \o/ radeon gpu reset rework is in -next, time for fences!
>
> ----------------------------------------------------------------
> Maarten Lankhorst (18):
>        drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep
>        drm/nouveau: require reservations for nouveau_fence_sync and nouveau_bo_fence
>        drm/ttm: call ttm_bo_wait while inside a reservation
>        drm/ttm: kill fence_lock
>        drm/ttm: add interruptible parameter to ttm_eu_reserve_buffers
>        drm/ttm: kill off some members to ttm_validate_buffer
>        drm/radeon: use common fence implementation for fences, v3
>        drm/vmwgfx: get rid of different types of fence_flags entirely
>        drm/vmwgfx: rework to new fence interface, v2
>        drm/nouveau: rework to new fence interface
>        drm/qxl: rework to new fence interface
>        drm/ttm: flip the switch, and convert to dma_fence
>        drm/nouveau: use rcu in nouveau_gem_ioctl_cpu_prep
>        drm/radeon: use rcu waits in some ioctls
>        drm/vmwgfx: use rcu in vmw_user_dmabuf_synccpu_grab
>        drm/ttm: use rcu in core ttm
>        drm/nouveau: Keep only a single list for validation.
>        drm/nouveau: use shared fences for readable objects
>
>   drivers/gpu/drm/nouveau/nouveau_bo.c      |  64 +---
>   drivers/gpu/drm/nouveau/nouveau_bo.h      |   2 +-
>   drivers/gpu/drm/nouveau/nouveau_display.c |  27 +-
>   drivers/gpu/drm/nouveau/nouveau_fence.c   | 518 ++++++++++++++++++++----------
>   drivers/gpu/drm/nouveau/nouveau_fence.h   |  26 +-
>   drivers/gpu/drm/nouveau/nouveau_gem.c     | 178 +++++-----
>   drivers/gpu/drm/nouveau/nv04_fence.c      |   6 +-
>   drivers/gpu/drm/nouveau/nv10_fence.c      |   6 +-
>   drivers/gpu/drm/nouveau/nv17_fence.c      |   4 +-
>   drivers/gpu/drm/nouveau/nv50_fence.c      |   4 +-
>   drivers/gpu/drm/nouveau/nv84_fence.c      |  22 +-
>   drivers/gpu/drm/qxl/Makefile              |   2 +-
>   drivers/gpu/drm/qxl/qxl_cmd.c             |   7 -
>   drivers/gpu/drm/qxl/qxl_debugfs.c         |  16 +-
>   drivers/gpu/drm/qxl/qxl_drv.h             |  20 +-
>   drivers/gpu/drm/qxl/qxl_fence.c           |  91 ------
>   drivers/gpu/drm/qxl/qxl_kms.c             |   1 +
>   drivers/gpu/drm/qxl/qxl_object.c          |   2 -
>   drivers/gpu/drm/qxl/qxl_object.h          |   6 +-
>   drivers/gpu/drm/qxl/qxl_release.c         | 172 ++++++++--
>   drivers/gpu/drm/qxl/qxl_ttm.c             |  93 ------
>   drivers/gpu/drm/radeon/radeon.h           |  15 +-
>   drivers/gpu/drm/radeon/radeon_cs.c        |  10 +-
>   drivers/gpu/drm/radeon/radeon_device.c    |   6 +-
>   drivers/gpu/drm/radeon/radeon_display.c   |   7 +-
>   drivers/gpu/drm/radeon/radeon_fence.c     | 256 +++++++++++++--
>   drivers/gpu/drm/radeon/radeon_gem.c       |  20 +-
>   drivers/gpu/drm/radeon/radeon_irq_kms.c   |  43 +++
>   drivers/gpu/drm/radeon/radeon_mn.c        |   6 +-
>   drivers/gpu/drm/radeon/radeon_object.c    |   8 +-
>   drivers/gpu/drm/radeon/radeon_ttm.c       |  34 +-
>   drivers/gpu/drm/radeon/radeon_uvd.c       |   6 +-
>   drivers/gpu/drm/radeon/radeon_vm.c        |  17 +-
>   drivers/gpu/drm/ttm/ttm_bo.c              | 187 +++++------
>   drivers/gpu/drm/ttm/ttm_bo_util.c         |  28 +-
>   drivers/gpu/drm/ttm/ttm_bo_vm.c           |   3 -
>   drivers/gpu/drm/ttm/ttm_execbuf_util.c    | 146 +++------
>   drivers/gpu/drm/vmwgfx/vmwgfx_buffer.c    |  47 ---
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.h       |   2 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c   |  24 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_fence.c     | 346 +++++++++++---------
>   drivers/gpu/drm/vmwgfx/vmwgfx_fence.h     |  35 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_fifo.c      |  11 +-
>   drivers/gpu/drm/vmwgfx/vmwgfx_resource.c  |  43 ++-
>   include/drm/ttm/ttm_bo_api.h              |   7 +-
>   include/drm/ttm/ttm_bo_driver.h           |  29 +-
>   include/drm/ttm/ttm_execbuf_util.h        |  22 +-
>   47 files changed, 1396 insertions(+), 1229 deletions(-)
>   delete mode 100644 drivers/gpu/drm/qxl/qxl_fence.c
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-01 12:31 ` Christian König
@ 2014-09-01 13:33   ` Maarten Lankhorst
  2014-09-01 16:21     ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Maarten Lankhorst @ 2014-09-01 13:33 UTC (permalink / raw)
  To: Christian König, Dave Airlie; +Cc: dri-devel

Hey,

Op 01-09-14 om 14:31 schreef Christian König:
> Please wait a second with that.
>
> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.

~Maarten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-01 13:33   ` Maarten Lankhorst
@ 2014-09-01 16:21     ` Christian König
  2014-09-01 18:43       ` Maarten Lankhorst
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2014-09-01 16:21 UTC (permalink / raw)
  To: Maarten Lankhorst, Dave Airlie; +Cc: dri-devel

Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
> Hey,
>
> Op 01-09-14 om 14:31 schreef Christian König:
>> Please wait a second with that.
>>
>> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
> Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
> just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.

Yeah, but the resulting patch looks to complex for my taste and should 
be simplified a bit more. Here is a more detailed review:

> +    wait_queue_t fence_wake;
Only a nitpick, but please fix the indention and maybe add a comment.

> +    struct work_struct delayed_irq_work;
Just drop that, the new fall back work item should take care of this 
when the unfortunate case happens that somebody tries to 
enable_signaling in the middle of a GPU reset.

>  /*
> - * Cast helper
> - */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> -
> -/*
Please define the new cast helper in radeon.h as well.

>      if (!rdev->needs_reset) {
> -        up_write(&rdev->exclusive_lock);
> +        downgrade_write(&rdev->exclusive_lock);
> +        wake_up_all(&rdev->fence_queue);
> +        up_read(&rdev->exclusive_lock);
>          return 0;
>      }
Just drop that as well, no need to wake up anybody here.

>  downgrade_write(&rdev->exclusive_lock);
> +    wake_up_all(&rdev->fence_queue);
Same here, the IB test will wake up all fences for recheck anyway.

> + * radeon_fence_read_seq - Returns the current fence value without 
> updating
> + *
> + * @rdev: radeon_device pointer
> + * @ring: ring index to return the seqno of
> + */
> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int 
> ring)
> +{
> +    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
> +    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
> +    uint64_t seq = radeon_fence_read(rdev, ring);
> +
> +    seq = radeon_fence_read(rdev, ring);
> +    seq |= last_seq & 0xffffffff00000000LL;
> +    if (seq < last_seq) {
> +        seq &= 0xffffffff;
> +        seq |= last_emitted & 0xffffffff00000000LL;
> +    }
> +    return seq;
> +}
Completely drop that and just check the last_seq signaled as set by 
radeon_fence_activity.

> +        if (!ret)
> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
> +        else
> +            FENCE_TRACE(&fence->base, "was already signaled\n");
Is all that text tracing necessary? Probably better define a trace point 
here.

> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= 
> fence->seq ||
> +        !rdev->ddev->irq_enabled)
> +        return false;
Checking irq_enabled here might not be such a good idea if the fence 
code don't has a fall back on it's own. What exactly happens if 
enable_signaling returns false?

> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
> +                         signed long timeout)
> +{
> +    struct radeon_fence *fence = to_radeon_fence(f);
> +    struct radeon_device *rdev = fence->rdev;
> +    bool signaled;
> +
> +    fence_enable_sw_signaling(&fence->base);
> +
> +    /*
> +     * This function has to return -EDEADLK, but cannot hold
> +     * exclusive_lock during the wait because some callers
> +     * may already hold it. This means checking needs_reset without
> +     * lock, and not fiddling with any gpu internals.
> +     *
> +     * The callback installed with fence_enable_sw_signaling will
> +     * run before our wait_event_*timeout call, so we will see
> +     * both the signaled fence and the changes to needs_reset.
> +     */
> +
> +    if (intr)
> +        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
> +                               ((signaled = 
> (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || 
> rdev->needs_reset),
> +                               timeout);
> +    else
> +        timeout = wait_event_timeout(rdev->fence_queue,
> +                         ((signaled = 
> (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || 
> rdev->needs_reset),
> +                         timeout);
> +
> +    if (timeout > 0 && !signaled)
> +        return -EDEADLK;
> +    return timeout;
> +}
This at least needs to be properly formated, but I think since we now 
don't need extra handling any more we don't need an extra wait function 
as well.

Regards,
Christian.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-01 16:21     ` Christian König
@ 2014-09-01 18:43       ` Maarten Lankhorst
  2014-09-02  8:51         ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Maarten Lankhorst @ 2014-09-01 18:43 UTC (permalink / raw)
  To: Christian König, Dave Airlie; +Cc: dri-devel

Hey,

On 01-09-14 18:21, Christian König wrote:
> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>> Hey,
>>
>> Op 01-09-14 om 14:31 schreef Christian König:
>>> Please wait a second with that.
>>>
>>> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
>> Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
>> just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.
> 
> Yeah, but the resulting patch looks to complex for my taste and should be simplified a bit more. Here is a more detailed review:
> 
>> +    wait_queue_t fence_wake;
> Only a nitpick, but please fix the indention and maybe add a comment.
> 
>> +    struct work_struct delayed_irq_work;
> Just drop that, the new fall back work item should take care of this when the unfortunate case happens that somebody tries to enable_signaling in the middle of a GPU reset.

I can only drop it if radeon_gpu_reset will always call radeon_irq_set after downgrading to read mode, even if no work needs to be done. :-)

Then again, should be possible.
>>  /*
>> - * Cast helper
>> - */
>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>> -
>> -/*
> Please define the new cast helper in radeon.h as well.
The ops are only defined in radeon_fence.c, and nothing outside of radeon_fence.c should care about the internals.

>>      if (!rdev->needs_reset) {
>> -        up_write(&rdev->exclusive_lock);
>> +        downgrade_write(&rdev->exclusive_lock);
>> +        wake_up_all(&rdev->fence_queue);
>> +        up_read(&rdev->exclusive_lock);
>>          return 0;
>>      }
> Just drop that as well, no need to wake up anybody here.

Maybe not, but if I have to remove delayed_irq_work I do need to add a radeon_irq_set here.

>>  downgrade_write(&rdev->exclusive_lock);
>> +    wake_up_all(&rdev->fence_queue);
> Same here, the IB test will wake up all fences for recheck anyway.
Same as previous comment. :-)

>> + * radeon_fence_read_seq - Returns the current fence value without updating
>> + *
>> + * @rdev: radeon_device pointer
>> + * @ring: ring index to return the seqno of
>> + */
>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int ring)
>> +{
>> +    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>> +    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
>> +    uint64_t seq = radeon_fence_read(rdev, ring);
>> +
>> +    seq = radeon_fence_read(rdev, ring);
>> +    seq |= last_seq & 0xffffffff00000000LL;
>> +    if (seq < last_seq) {
>> +        seq &= 0xffffffff;
>> +        seq |= last_emitted & 0xffffffff00000000LL;
>> +    }
>> +    return seq;
>> +}
> Completely drop that and just check the last_seq signaled as set by radeon_fence_activity.

Do you mean call radeon_fence_activity in radeon_fence_signaled? Or should I just use the cached value in radeon_fence_check_signaled.

I can't call fence_activity in check_signaled, because it would cause re-entrancy in fence_queue.

>> +        if (!ret)
>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
>> +        else
>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
> Is all that text tracing necessary? Probably better define a trace point here.
It gets optimized out normally. There's already a tracepoint called in fence_signal.
 
>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>> +        !rdev->ddev->irq_enabled)
>> +        return false;
> Checking irq_enabled here might not be such a good idea if the fence code don't has a fall back on it's own. What exactly happens if enable_signaling returns false?

I thought irq_enabled couldn't happen under normal circumstances?
Anyway the fence gets treated as signaled if it returns false, and fence_signal will get called.

>> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
>> +                         signed long timeout)
>> +{
>> +    struct radeon_fence *fence = to_radeon_fence(f);
>> +    struct radeon_device *rdev = fence->rdev;
>> +    bool signaled;
>> +
>> +    fence_enable_sw_signaling(&fence->base);
>> +
>> +    /*
>> +     * This function has to return -EDEADLK, but cannot hold
>> +     * exclusive_lock during the wait because some callers
>> +     * may already hold it. This means checking needs_reset without
>> +     * lock, and not fiddling with any gpu internals.
>> +     *
>> +     * The callback installed with fence_enable_sw_signaling will
>> +     * run before our wait_event_*timeout call, so we will see
>> +     * both the signaled fence and the changes to needs_reset.
>> +     */
>> +
>> +    if (intr)
>> +        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
>> +                               ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>> +                               timeout);
>> +    else
>> +        timeout = wait_event_timeout(rdev->fence_queue,
>> +                         ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>> +                         timeout);
>> +
>> +    if (timeout > 0 && !signaled)
>> +        return -EDEADLK;
>> +    return timeout;
>> +}
> This at least needs to be properly formated, but I think since we now don't need extra handling any more we don't need an extra wait function as well.

I thought of removing the extra handling, but the -EDEADLK stuff is needed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwise if the gpu's really hung there would never be any progress forward.

~Maarten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-01 18:43       ` Maarten Lankhorst
@ 2014-09-02  8:51         ` Christian König
  2014-09-02  9:12           ` Maarten Lankhorst
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2014-09-02  8:51 UTC (permalink / raw)
  To: Maarten Lankhorst, Dave Airlie; +Cc: dri-devel

Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
> Hey,
>
> On 01-09-14 18:21, Christian König wrote:
>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>>> Hey,
>>>
>>> Op 01-09-14 om 14:31 schreef Christian König:
>>>> Please wait a second with that.
>>>>
>>>> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
>>> Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
>>> just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.
>> Yeah, but the resulting patch looks to complex for my taste and should be simplified a bit more. Here is a more detailed review:
>>
>>> +    wait_queue_t fence_wake;
>> Only a nitpick, but please fix the indention and maybe add a comment.
>>
>>> +    struct work_struct delayed_irq_work;
>> Just drop that, the new fall back work item should take care of this when the unfortunate case happens that somebody tries to enable_signaling in the middle of a GPU reset.
> I can only drop it if radeon_gpu_reset will always call radeon_irq_set after downgrading to read mode, even if no work needs to be done. :-)
>
> Then again, should be possible.

The fall back handler should take care of the rare condition that we 
can't activate the IRQ because the driver is in a lockup handler.

The issue is that the delayed_irq_work handler needs to take the 
exclusive lock once more and so would block an innocent process for the 
duration of the GPU lockup.

Either reschedule as delayed work item if we can't take the lock 
immediately or just live with the delay of the fall back handler. Since 
IRQs usually don't work correctly immediately after an GPU reset I'm 
pretty sure that the fallback handler will be needed anyway.

>>>   /*
>>> - * Cast helper
>>> - */
>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>>> -
>>> -/*
>> Please define the new cast helper in radeon.h as well.
> The ops are only defined in radeon_fence.c, and nothing outside of radeon_fence.c should care about the internals.

Then define this as a function instead, I need a checked cast from a 
fence to a radeon_fence outside of the fence code as well.

>
>>>       if (!rdev->needs_reset) {
>>> -        up_write(&rdev->exclusive_lock);
>>> +        downgrade_write(&rdev->exclusive_lock);
>>> +        wake_up_all(&rdev->fence_queue);
>>> +        up_read(&rdev->exclusive_lock);
>>>           return 0;
>>>       }
>> Just drop that as well, no need to wake up anybody here.
> Maybe not, but if I have to remove delayed_irq_work I do need to add a radeon_irq_set here.
>
>>>   downgrade_write(&rdev->exclusive_lock);
>>> +    wake_up_all(&rdev->fence_queue);
>> Same here, the IB test will wake up all fences for recheck anyway.
> Same as previous comment. :-)
>
>>> + * radeon_fence_read_seq - Returns the current fence value without updating
>>> + *
>>> + * @rdev: radeon_device pointer
>>> + * @ring: ring index to return the seqno of
>>> + */
>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int ring)
>>> +{
>>> +    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>>> +    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
>>> +    uint64_t seq = radeon_fence_read(rdev, ring);
>>> +
>>> +    seq = radeon_fence_read(rdev, ring);
>>> +    seq |= last_seq & 0xffffffff00000000LL;
>>> +    if (seq < last_seq) {
>>> +        seq &= 0xffffffff;
>>> +        seq |= last_emitted & 0xffffffff00000000LL;
>>> +    }
>>> +    return seq;
>>> +}
>> Completely drop that and just check the last_seq signaled as set by radeon_fence_activity.
> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or should I just use the cached value in radeon_fence_check_signaled.

Just check the cached value, it should be updated by 
radeon_fence_activity immediately before calling this anyway.

>
> I can't call fence_activity in check_signaled, because it would cause re-entrancy in fence_queue.
>
>>> +        if (!ret)
>>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
>>> +        else
>>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
>> Is all that text tracing necessary? Probably better define a trace point here.
> It gets optimized out normally. There's already a tracepoint called in fence_signal.
>   
>>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>> +        !rdev->ddev->irq_enabled)
>>> +        return false;
>> Checking irq_enabled here might not be such a good idea if the fence code don't has a fall back on it's own. What exactly happens if enable_signaling returns false?
> I thought irq_enabled couldn't happen under normal circumstances?

Not 100% sure, but I think it is temporary turned off during reset.

> Anyway the fence gets treated as signaled if it returns false, and fence_signal will get called.
Thought so, well that's rather bad if we failed to install the IRQ 
handle that we just treat all fences as signaled isn't it?

>
>>> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
>>> +                         signed long timeout)
>>> +{
>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>> +    struct radeon_device *rdev = fence->rdev;
>>> +    bool signaled;
>>> +
>>> +    fence_enable_sw_signaling(&fence->base);
>>> +
>>> +    /*
>>> +     * This function has to return -EDEADLK, but cannot hold
>>> +     * exclusive_lock during the wait because some callers
>>> +     * may already hold it. This means checking needs_reset without
>>> +     * lock, and not fiddling with any gpu internals.
>>> +     *
>>> +     * The callback installed with fence_enable_sw_signaling will
>>> +     * run before our wait_event_*timeout call, so we will see
>>> +     * both the signaled fence and the changes to needs_reset.
>>> +     */
>>> +
>>> +    if (intr)
>>> +        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
>>> +                               ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>> +                               timeout);
>>> +    else
>>> +        timeout = wait_event_timeout(rdev->fence_queue,
>>> +                         ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>> +                         timeout);
>>> +
>>> +    if (timeout > 0 && !signaled)
>>> +        return -EDEADLK;
>>> +    return timeout;
>>> +}
>> This at least needs to be properly formated, but I think since we now don't need extra handling any more we don't need an extra wait function as well.
> I thought of removing the extra handling, but the -EDEADLK stuff is needed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwise if the gpu's really hung there would never be any progress forward.

Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wait.

Regards,
Christian.

>
> ~Maarten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-02  8:51         ` Christian König
@ 2014-09-02  9:12           ` Maarten Lankhorst
  2014-09-02  9:52             ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Maarten Lankhorst @ 2014-09-02  9:12 UTC (permalink / raw)
  To: Christian König, Dave Airlie; +Cc: dri-devel

Op 02-09-14 om 10:51 schreef Christian König:
> Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
>> Hey,
>>
>> On 01-09-14 18:21, Christian König wrote:
>>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>>>> Hey,
>>>>
>>>> Op 01-09-14 om 14:31 schreef Christian König:
>>>>> Please wait a second with that.
>>>>>
>>>>> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
>>>> Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
>>>> just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.
>>> Yeah, but the resulting patch looks to complex for my taste and should be simplified a bit more. Here is a more detailed review:
>>>
>>>> +    wait_queue_t fence_wake;
>>> Only a nitpick, but please fix the indention and maybe add a comment.
>>>
>>>> +    struct work_struct delayed_irq_work;
>>> Just drop that, the new fall back work item should take care of this when the unfortunate case happens that somebody tries to enable_signaling in the middle of a GPU reset.
>> I can only drop it if radeon_gpu_reset will always call radeon_irq_set after downgrading to read mode, even if no work needs to be done. :-)
>>
>> Then again, should be possible.
>
> The fall back handler should take care of the rare condition that we can't activate the IRQ because the driver is in a lockup handler.
>
> The issue is that the delayed_irq_work handler needs to take the exclusive lock once more and so would block an innocent process for the duration of the GPU lockup.
>
> Either reschedule as delayed work item if we can't take the lock immediately or just live with the delay of the fall back handler. Since IRQs usually don't work correctly immediately after an GPU reset I'm pretty sure that the fallback handler will be needed anyway.
Ok, rescheduling would be fine. Or could I go with the alternative, remove the delayed_irq_work and always set irqs after downgrading the write lock?

>>>>   /*
>>>> - * Cast helper
>>>> - */
>>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>>>> -
>>>> -/*
>>> Please define the new cast helper in radeon.h as well.
>> The ops are only defined in radeon_fence.c, and nothing outside of radeon_fence.c should care about the internals.
>
> Then define this as a function instead, I need a checked cast from a fence to a radeon_fence outside of the fence code as well.
Ok.

>>
>>>>       if (!rdev->needs_reset) {
>>>> -        up_write(&rdev->exclusive_lock);
>>>> +        downgrade_write(&rdev->exclusive_lock);
>>>> +        wake_up_all(&rdev->fence_queue);
>>>> +        up_read(&rdev->exclusive_lock);
>>>>           return 0;
>>>>       }
>>> Just drop that as well, no need to wake up anybody here.
>> Maybe not, but if I have to remove delayed_irq_work I do need to add a radeon_irq_set here.
>>
>>>>   downgrade_write(&rdev->exclusive_lock);
>>>> +    wake_up_all(&rdev->fence_queue);
>>> Same here, the IB test will wake up all fences for recheck anyway.
>> Same as previous comment. :-)
>>
>>>> + * radeon_fence_read_seq - Returns the current fence value without updating
>>>> + *
>>>> + * @rdev: radeon_device pointer
>>>> + * @ring: ring index to return the seqno of
>>>> + */
>>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int ring)
>>>> +{
>>>> +    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>>>> +    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
>>>> +    uint64_t seq = radeon_fence_read(rdev, ring);
>>>> +
>>>> +    seq = radeon_fence_read(rdev, ring);
>>>> +    seq |= last_seq & 0xffffffff00000000LL;
>>>> +    if (seq < last_seq) {
>>>> +        seq &= 0xffffffff;
>>>> +        seq |= last_emitted & 0xffffffff00000000LL;
>>>> +    }
>>>> +    return seq;
>>>> +}
>>> Completely drop that and just check the last_seq signaled as set by radeon_fence_activity.
>> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or should I just use the cached value in radeon_fence_check_signaled.
>
> Just check the cached value, it should be updated by radeon_fence_activity immediately before calling this anyway.
Ok. I think I wrote this as a workaround for unreliable interrupts. :-)

>>
>> I can't call fence_activity in check_signaled, because it would cause re-entrancy in fence_queue.
>>
>>>> +        if (!ret)
>>>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
>>>> +        else
>>>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
>>> Is all that text tracing necessary? Probably better define a trace point here.
>> It gets optimized out normally. There's already a tracepoint called in fence_signal.
>>  
>>>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>> +        !rdev->ddev->irq_enabled)
>>>> +        return false;
>>> Checking irq_enabled here might not be such a good idea if the fence code don't has a fall back on it's own. What exactly happens if enable_signaling returns false?
>> I thought irq_enabled couldn't happen under normal circumstances?
>
> Not 100% sure, but I think it is temporary turned off during reset.
>
>> Anyway the fence gets treated as signaled if it returns false, and fence_signal will get called.
> Thought so, well that's rather bad if we failed to install the IRQ handle that we just treat all fences as signaled isn't it?
I wrote this code before the delayed work was added, I guess the check for !irq_enabled can be removed now. :-)

>>
>>>> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
>>>> +                         signed long timeout)
>>>> +{
>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>> +    struct radeon_device *rdev = fence->rdev;
>>>> +    bool signaled;
>>>> +
>>>> +    fence_enable_sw_signaling(&fence->base);
>>>> +
>>>> +    /*
>>>> +     * This function has to return -EDEADLK, but cannot hold
>>>> +     * exclusive_lock during the wait because some callers
>>>> +     * may already hold it. This means checking needs_reset without
>>>> +     * lock, and not fiddling with any gpu internals.
>>>> +     *
>>>> +     * The callback installed with fence_enable_sw_signaling will
>>>> +     * run before our wait_event_*timeout call, so we will see
>>>> +     * both the signaled fence and the changes to needs_reset.
>>>> +     */
>>>> +
>>>> +    if (intr)
>>>> +        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
>>>> +                               ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>> +                               timeout);
>>>> +    else
>>>> +        timeout = wait_event_timeout(rdev->fence_queue,
>>>> +                         ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>> +                         timeout);
>>>> +
>>>> +    if (timeout > 0 && !signaled)
>>>> +        return -EDEADLK;
>>>> +    return timeout;
>>>> +}
>>> This at least needs to be properly formated, but I think since we now don't need extra handling any more we don't need an extra wait function as well.
>> I thought of removing the extra handling, but the -EDEADLK stuff is needed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwise if the gpu's really hung there would never be any progress forward.
>
> Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wait.
Oops, you're right. ttm_bo_delayed_delete is called with remove_all false, not true.

Unfortunately ttm_bo_vm_fault does hold the exclusive_lock in read mode, and other places that use eviction will use it too.
Without returning -EDEADLK this could mean that ttm_bo_move_accel_cleanup would block forever,
so this function has to stay.

~Maarten

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-02  9:12           ` Maarten Lankhorst
@ 2014-09-02  9:52             ` Christian König
  2014-09-02 12:29               ` Maarten Lankhorst
  0 siblings, 1 reply; 10+ messages in thread
From: Christian König @ 2014-09-02  9:52 UTC (permalink / raw)
  To: Maarten Lankhorst, Dave Airlie; +Cc: dri-devel

Am 02.09.2014 um 11:12 schrieb Maarten Lankhorst:
> Op 02-09-14 om 10:51 schreef Christian König:
>> Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
>>> Hey,
>>>
>>> On 01-09-14 18:21, Christian König wrote:
>>>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>>>>> Hey,
>>>>>
>>>>> Op 01-09-14 om 14:31 schreef Christian König:
>>>>>> Please wait a second with that.
>>>>>>
>>>>>> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
>>>>> Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
>>>>> just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.
>>>> Yeah, but the resulting patch looks to complex for my taste and should be simplified a bit more. Here is a more detailed review:
>>>>
>>>>> +    wait_queue_t fence_wake;
>>>> Only a nitpick, but please fix the indention and maybe add a comment.
>>>>
>>>>> +    struct work_struct delayed_irq_work;
>>>> Just drop that, the new fall back work item should take care of this when the unfortunate case happens that somebody tries to enable_signaling in the middle of a GPU reset.
>>> I can only drop it if radeon_gpu_reset will always call radeon_irq_set after downgrading to read mode, even if no work needs to be done. :-)
>>>
>>> Then again, should be possible.
>> The fall back handler should take care of the rare condition that we can't activate the IRQ because the driver is in a lockup handler.
>>
>> The issue is that the delayed_irq_work handler needs to take the exclusive lock once more and so would block an innocent process for the duration of the GPU lockup.
>>
>> Either reschedule as delayed work item if we can't take the lock immediately or just live with the delay of the fall back handler. Since IRQs usually don't work correctly immediately after an GPU reset I'm pretty sure that the fallback handler will be needed anyway.
> Ok, rescheduling would be fine. Or could I go with the alternative, remove the delayed_irq_work and always set irqs after downgrading the write lock?

Always setting the IRQ's after downgrading the write lock would work for 
me as well.


>
>>>>>    /*
>>>>> - * Cast helper
>>>>> - */
>>>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>>>>> -
>>>>> -/*
>>>> Please define the new cast helper in radeon.h as well.
>>> The ops are only defined in radeon_fence.c, and nothing outside of radeon_fence.c should care about the internals.
>> Then define this as a function instead, I need a checked cast from a fence to a radeon_fence outside of the fence code as well.
> Ok.
>
>>>>>        if (!rdev->needs_reset) {
>>>>> -        up_write(&rdev->exclusive_lock);
>>>>> +        downgrade_write(&rdev->exclusive_lock);
>>>>> +        wake_up_all(&rdev->fence_queue);
>>>>> +        up_read(&rdev->exclusive_lock);
>>>>>            return 0;
>>>>>        }
>>>> Just drop that as well, no need to wake up anybody here.
>>> Maybe not, but if I have to remove delayed_irq_work I do need to add a radeon_irq_set here.
>>>
>>>>>    downgrade_write(&rdev->exclusive_lock);
>>>>> +    wake_up_all(&rdev->fence_queue);
>>>> Same here, the IB test will wake up all fences for recheck anyway.
>>> Same as previous comment. :-)
>>>
>>>>> + * radeon_fence_read_seq - Returns the current fence value without updating
>>>>> + *
>>>>> + * @rdev: radeon_device pointer
>>>>> + * @ring: ring index to return the seqno of
>>>>> + */
>>>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int ring)
>>>>> +{
>>>>> +    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>>>>> +    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
>>>>> +    uint64_t seq = radeon_fence_read(rdev, ring);
>>>>> +
>>>>> +    seq = radeon_fence_read(rdev, ring);
>>>>> +    seq |= last_seq & 0xffffffff00000000LL;
>>>>> +    if (seq < last_seq) {
>>>>> +        seq &= 0xffffffff;
>>>>> +        seq |= last_emitted & 0xffffffff00000000LL;
>>>>> +    }
>>>>> +    return seq;
>>>>> +}
>>>> Completely drop that and just check the last_seq signaled as set by radeon_fence_activity.
>>> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or should I just use the cached value in radeon_fence_check_signaled.
>> Just check the cached value, it should be updated by radeon_fence_activity immediately before calling this anyway.
> Ok. I think I wrote this as a workaround for unreliable interrupts. :-)
>
>>> I can't call fence_activity in check_signaled, because it would cause re-entrancy in fence_queue.
>>>
>>>>> +        if (!ret)
>>>>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
>>>>> +        else
>>>>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
>>>> Is all that text tracing necessary? Probably better define a trace point here.
>>> It gets optimized out normally. There's already a tracepoint called in fence_signal.
>>>   
>>>>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>>> +        !rdev->ddev->irq_enabled)
>>>>> +        return false;
>>>> Checking irq_enabled here might not be such a good idea if the fence code don't has a fall back on it's own. What exactly happens if enable_signaling returns false?
>>> I thought irq_enabled couldn't happen under normal circumstances?
>> Not 100% sure, but I think it is temporary turned off during reset.
>>
>>> Anyway the fence gets treated as signaled if it returns false, and fence_signal will get called.
>> Thought so, well that's rather bad if we failed to install the IRQ handle that we just treat all fences as signaled isn't it?
> I wrote this code before the delayed work was added, I guess the check for !irq_enabled can be removed now. :-)
>
>>>>> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
>>>>> +                         signed long timeout)
>>>>> +{
>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>> +    struct radeon_device *rdev = fence->rdev;
>>>>> +    bool signaled;
>>>>> +
>>>>> +    fence_enable_sw_signaling(&fence->base);
>>>>> +
>>>>> +    /*
>>>>> +     * This function has to return -EDEADLK, but cannot hold
>>>>> +     * exclusive_lock during the wait because some callers
>>>>> +     * may already hold it. This means checking needs_reset without
>>>>> +     * lock, and not fiddling with any gpu internals.
>>>>> +     *
>>>>> +     * The callback installed with fence_enable_sw_signaling will
>>>>> +     * run before our wait_event_*timeout call, so we will see
>>>>> +     * both the signaled fence and the changes to needs_reset.
>>>>> +     */
>>>>> +
>>>>> +    if (intr)
>>>>> +        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
>>>>> +                               ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>> +                               timeout);
>>>>> +    else
>>>>> +        timeout = wait_event_timeout(rdev->fence_queue,
>>>>> +                         ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>> +                         timeout);
>>>>> +
>>>>> +    if (timeout > 0 && !signaled)
>>>>> +        return -EDEADLK;
>>>>> +    return timeout;
>>>>> +}
>>>> This at least needs to be properly formated, but I think since we now don't need extra handling any more we don't need an extra wait function as well.
>>> I thought of removing the extra handling, but the -EDEADLK stuff is needed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwise if the gpu's really hung there would never be any progress forward.
>> Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wait.
> Oops, you're right. ttm_bo_delayed_delete is called with remove_all false, not true.
>
> Unfortunately ttm_bo_vm_fault does hold the exclusive_lock in read mode, and other places that use eviction will use it too.
> Without returning -EDEADLK this could mean that ttm_bo_move_accel_cleanup would block forever,
> so this function has to stay.

Ok, fine with me. I'm not deep enough into the TTM code to really judge 
this, but my understanding was that TTM still calls it's own wait callback.

Regards,
Christian.

>
> ~Maarten
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-02  9:52             ` Christian König
@ 2014-09-02 12:29               ` Maarten Lankhorst
  2014-09-02 13:47                 ` Christian König
  0 siblings, 1 reply; 10+ messages in thread
From: Maarten Lankhorst @ 2014-09-02 12:29 UTC (permalink / raw)
  To: Christian König, Dave Airlie; +Cc: dri-devel

Op 02-09-14 om 11:52 schreef Christian König:
> Am 02.09.2014 um 11:12 schrieb Maarten Lankhorst:
>> Op 02-09-14 om 10:51 schreef Christian König:
>>> Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
>>>> Hey,
>>>>
>>>> On 01-09-14 18:21, Christian König wrote:
>>>>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>>>>>> Hey,
>>>>>>
>>>>>> Op 01-09-14 om 14:31 schreef Christian König:
>>>>>>> Please wait a second with that.
>>>>>>>
>>>>>>> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
>>>>>> Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
>>>>>> just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.
>>>>> Yeah, but the resulting patch looks to complex for my taste and should be simplified a bit more. Here is a more detailed review:
>>>>>
>>>>>> +    wait_queue_t fence_wake;
>>>>> Only a nitpick, but please fix the indention and maybe add a comment.
>>>>>
>>>>>> +    struct work_struct delayed_irq_work;
>>>>> Just drop that, the new fall back work item should take care of this when the unfortunate case happens that somebody tries to enable_signaling in the middle of a GPU reset.
>>>> I can only drop it if radeon_gpu_reset will always call radeon_irq_set after downgrading to read mode, even if no work needs to be done. :-)
>>>>
>>>> Then again, should be possible.
>>> The fall back handler should take care of the rare condition that we can't activate the IRQ because the driver is in a lockup handler.
>>>
>>> The issue is that the delayed_irq_work handler needs to take the exclusive lock once more and so would block an innocent process for the duration of the GPU lockup.
>>>
>>> Either reschedule as delayed work item if we can't take the lock immediately or just live with the delay of the fall back handler. Since IRQs usually don't work correctly immediately after an GPU reset I'm pretty sure that the fallback handler will be needed anyway.
>> Ok, rescheduling would be fine. Or could I go with the alternative, remove the delayed_irq_work and always set irqs after downgrading the write lock?
>
> Always setting the IRQ's after downgrading the write lock would work for me as well.
>
>
>>
>>>>>>    /*
>>>>>> - * Cast helper
>>>>>> - */
>>>>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>>>>>> -
>>>>>> -/*
>>>>> Please define the new cast helper in radeon.h as well.
>>>> The ops are only defined in radeon_fence.c, and nothing outside of radeon_fence.c should care about the internals.
>>> Then define this as a function instead, I need a checked cast from a fence to a radeon_fence outside of the fence code as well.
>> Ok.
>>
>>>>>>        if (!rdev->needs_reset) {
>>>>>> -        up_write(&rdev->exclusive_lock);
>>>>>> +        downgrade_write(&rdev->exclusive_lock);
>>>>>> +        wake_up_all(&rdev->fence_queue);
>>>>>> +        up_read(&rdev->exclusive_lock);
>>>>>>            return 0;
>>>>>>        }
>>>>> Just drop that as well, no need to wake up anybody here.
>>>> Maybe not, but if I have to remove delayed_irq_work I do need to add a radeon_irq_set here.
>>>>
>>>>>>    downgrade_write(&rdev->exclusive_lock);
>>>>>> +    wake_up_all(&rdev->fence_queue);
>>>>> Same here, the IB test will wake up all fences for recheck anyway.
>>>> Same as previous comment. :-)
>>>>
>>>>>> + * radeon_fence_read_seq - Returns the current fence value without updating
>>>>>> + *
>>>>>> + * @rdev: radeon_device pointer
>>>>>> + * @ring: ring index to return the seqno of
>>>>>> + */
>>>>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int ring)
>>>>>> +{
>>>>>> +    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>>>>>> +    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
>>>>>> +    uint64_t seq = radeon_fence_read(rdev, ring);
>>>>>> +
>>>>>> +    seq = radeon_fence_read(rdev, ring);
>>>>>> +    seq |= last_seq & 0xffffffff00000000LL;
>>>>>> +    if (seq < last_seq) {
>>>>>> +        seq &= 0xffffffff;
>>>>>> +        seq |= last_emitted & 0xffffffff00000000LL;
>>>>>> +    }
>>>>>> +    return seq;
>>>>>> +}
>>>>> Completely drop that and just check the last_seq signaled as set by radeon_fence_activity.
>>>> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or should I just use the cached value in radeon_fence_check_signaled.
>>> Just check the cached value, it should be updated by radeon_fence_activity immediately before calling this anyway.
>> Ok. I think I wrote this as a workaround for unreliable interrupts. :-)
>>
>>>> I can't call fence_activity in check_signaled, because it would cause re-entrancy in fence_queue.
>>>>
>>>>>> +        if (!ret)
>>>>>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
>>>>>> +        else
>>>>>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
>>>>> Is all that text tracing necessary? Probably better define a trace point here.
>>>> It gets optimized out normally. There's already a tracepoint called in fence_signal.
>>>>  
>>>>>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>>>> +        !rdev->ddev->irq_enabled)
>>>>>> +        return false;
>>>>> Checking irq_enabled here might not be such a good idea if the fence code don't has a fall back on it's own. What exactly happens if enable_signaling returns false?
>>>> I thought irq_enabled couldn't happen under normal circumstances?
>>> Not 100% sure, but I think it is temporary turned off during reset.
>>>
>>>> Anyway the fence gets treated as signaled if it returns false, and fence_signal will get called.
>>> Thought so, well that's rather bad if we failed to install the IRQ handle that we just treat all fences as signaled isn't it?
>> I wrote this code before the delayed work was added, I guess the check for !irq_enabled can be removed now. :-)
>>
>>>>>> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
>>>>>> +                         signed long timeout)
>>>>>> +{
>>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>>> +    struct radeon_device *rdev = fence->rdev;
>>>>>> +    bool signaled;
>>>>>> +
>>>>>> +    fence_enable_sw_signaling(&fence->base);
>>>>>> +
>>>>>> +    /*
>>>>>> +     * This function has to return -EDEADLK, but cannot hold
>>>>>> +     * exclusive_lock during the wait because some callers
>>>>>> +     * may already hold it. This means checking needs_reset without
>>>>>> +     * lock, and not fiddling with any gpu internals.
>>>>>> +     *
>>>>>> +     * The callback installed with fence_enable_sw_signaling will
>>>>>> +     * run before our wait_event_*timeout call, so we will see
>>>>>> +     * both the signaled fence and the changes to needs_reset.
>>>>>> +     */
>>>>>> +
>>>>>> +    if (intr)
>>>>>> +        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
>>>>>> +                               ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>> +                               timeout);
>>>>>> +    else
>>>>>> +        timeout = wait_event_timeout(rdev->fence_queue,
>>>>>> +                         ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>> +                         timeout);
>>>>>> +
>>>>>> +    if (timeout > 0 && !signaled)
>>>>>> +        return -EDEADLK;
>>>>>> +    return timeout;
>>>>>> +}
>>>>> This at least needs to be properly formated, but I think since we now don't need extra handling any more we don't need an extra wait function as well.
>>>> I thought of removing the extra handling, but the -EDEADLK stuff is needed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwise if the gpu's really hung there would never be any progress forward.
>>> Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wait.
>> Oops, you're right. ttm_bo_delayed_delete is called with remove_all false, not true.
>>
>> Unfortunately ttm_bo_vm_fault does hold the exclusive_lock in read mode, and other places that use eviction will use it too.
>> Without returning -EDEADLK this could mean that ttm_bo_move_accel_cleanup would block forever,
>> so this function has to stay.
>
> Ok, fine with me. I'm not deep enough into the TTM code to really judge this, but my understanding was that TTM still calls it's own wait callback.
That one is about to go away. :-)

How does this patch look?
---- 8< ----
commit e75af5ee3b94157e868cb48b4bfbc1ca36183ba4
Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Date:   Thu Jan 9 11:03:12 2014 +0100

    drm/radeon: use common fence implementation for fences, v4
    
    Changes since v1:
    - Kill the sw interrupt dance, add and use
      radeon_irq_kms_sw_irq_get_delayed instead.
    - Change custom wait function, lockdep complained about it.
      Holding exclusive_lock in the wait function might cause deadlocks.
      Instead do all the processing in .enable_signaling, and wait
      on the global fence_queue to pick up gpu resets.
    - Process all fences in radeon_gpu_reset after reset to close a race
      with the trylock in enable_signaling.
    Changes since v2:
    - Small changes to work with the rewritten lockup recovery patches.
    Changes since v3:
    - Call radeon_fence_schedule_check when exclusive_lock cannot be
      acquired to always cause a wake up.
    - Reset irqs from hangup check.
    - Drop reading seqno in the callback, use cached value.
    - Fix indentation in radeon_fence_default_wait
    - Add a radeon_test_signaled function, drop a few test_bit calls.
    - Make to_radeon_fence global.
    
    Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>

diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
index 83a24614138a..d80dc547a105 100644
--- a/drivers/gpu/drm/radeon/radeon.h
+++ b/drivers/gpu/drm/radeon/radeon.h
@@ -66,6 +66,7 @@
 #include <linux/kref.h>
 #include <linux/interval_tree.h>
 #include <linux/hashtable.h>
+#include <linux/fence.h>
 
 #include <ttm/ttm_bo_api.h>
 #include <ttm/ttm_bo_driver.h>
@@ -354,17 +355,19 @@ struct radeon_fence_driver {
 	/* sync_seq is protected by ring emission lock */
 	uint64_t			sync_seq[RADEON_NUM_RINGS];
 	atomic64_t			last_seq;
-	bool				initialized;
+	bool				initialized, delayed_irq;
 	struct delayed_work		lockup_work;
 };
 
 struct radeon_fence {
+	struct fence base;
+
 	struct radeon_device		*rdev;
-	struct kref			kref;
-	/* protected by radeon_fence.lock */
 	uint64_t			seq;
 	/* RB, DMA, etc. */
 	unsigned			ring;
+
+	wait_queue_t			fence_wake;
 };
 
 int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
@@ -782,6 +785,7 @@ struct radeon_irq {
 int radeon_irq_kms_init(struct radeon_device *rdev);
 void radeon_irq_kms_fini(struct radeon_device *rdev);
 void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring);
+bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int ring);
 void radeon_irq_kms_sw_irq_put(struct radeon_device *rdev, int ring);
 void radeon_irq_kms_pflip_irq_get(struct radeon_device *rdev, int crtc);
 void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc);
@@ -2308,6 +2312,7 @@ struct radeon_device {
 	struct radeon_mman		mman;
 	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
 	wait_queue_head_t		fence_queue;
+	unsigned			fence_context;
 	struct mutex			ring_lock;
 	struct radeon_ring		ring[RADEON_NUM_RINGS];
 	bool				ib_pool_ready;
@@ -2441,7 +2446,17 @@ void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
 /*
  * Cast helper
  */
-#define to_radeon_fence(p) ((struct radeon_fence *)(p))
+extern const struct fence_ops radeon_fence_ops;
+
+static inline struct radeon_fence *to_radeon_fence(struct fence *f)
+{
+	struct radeon_fence *__f = container_of(f, struct radeon_fence, base);
+
+	if (__f->base.ops == &radeon_fence_ops)
+		return __f;
+
+	return NULL;
+}
 
 /*
  * Registers read & write functions.
diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
index d30f1cc1aa12..e84a76e6656a 100644
--- a/drivers/gpu/drm/radeon/radeon_device.c
+++ b/drivers/gpu/drm/radeon/radeon_device.c
@@ -1253,6 +1253,7 @@ int radeon_device_init(struct radeon_device *rdev,
 	for (i = 0; i < RADEON_NUM_RINGS; i++) {
 		rdev->ring[i].idx = i;
 	}
+	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
 
 	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
 		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
index ecdba3afa2c3..af9f2d6bd7d0 100644
--- a/drivers/gpu/drm/radeon/radeon_fence.c
+++ b/drivers/gpu/drm/radeon/radeon_fence.c
@@ -130,15 +130,18 @@ int radeon_fence_emit(struct radeon_device *rdev,
 		      struct radeon_fence **fence,
 		      int ring)
 {
+	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
+
 	/* we are protected by the ring emission mutex */
 	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
 	if ((*fence) == NULL) {
 		return -ENOMEM;
 	}
-	kref_init(&((*fence)->kref));
 	(*fence)->rdev = rdev;
-	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
+	(*fence)->seq = seq;
 	(*fence)->ring = ring;
+	fence_init(&(*fence)->base, &radeon_fence_ops,
+		   &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
 	radeon_fence_ring_emit(rdev, ring, *fence);
 	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
 	radeon_fence_schedule_check(rdev, ring);
@@ -146,6 +149,41 @@ int radeon_fence_emit(struct radeon_device *rdev,
 }
 
 /**
+ * radeon_fence_check_signaled - callback from fence_queue
+ *
+ * this function is called with fence_queue lock held, which is also used
+ * for the fence locking itself, so unlocked variants are used for
+ * fence_signal, and remove_wait_queue.
+ */
+static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
+{
+	struct radeon_fence *fence;
+	u64 seq;
+
+	fence = container_of(wait, struct radeon_fence, fence_wake);
+
+	/*
+	 * We cannot use radeon_fence_process here because we're already
+	 * in the waitqueue, in a call from wake_up_all.
+	 */
+	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
+	if (seq >= fence->seq) {
+		int ret = fence_signal_locked(&fence->base);
+
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from irq context\n");
+		else
+			FENCE_TRACE(&fence->base, "was already signaled\n");
+
+		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
+		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
+		fence_put(&fence->base);
+	} else
+		FENCE_TRACE(&fence->base, "pending\n");
+	return 0;
+}
+
+/**
  * radeon_fence_activity - check for fence activity
  *
  * @rdev: radeon_device pointer
@@ -242,6 +280,15 @@ static void radeon_fence_check_lockup(struct work_struct *work)
 		return;
 	}
 
+	if (fence_drv->delayed_irq && rdev->ddev->irq_enabled) {
+		unsigned long irqflags;
+
+		fence_drv->delayed_irq = false;
+		spin_lock_irqsave(&rdev->irq.lock, irqflags);
+		radeon_irq_set(rdev);
+		spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
+	}
+
 	if (radeon_fence_activity(rdev, ring))
 		wake_up_all(&rdev->fence_queue);
 
@@ -276,21 +323,6 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
 }
 
 /**
- * radeon_fence_destroy - destroy a fence
- *
- * @kref: fence kref
- *
- * Frees the fence object (all asics).
- */
-static void radeon_fence_destroy(struct kref *kref)
-{
-	struct radeon_fence *fence;
-
-	fence = container_of(kref, struct radeon_fence, kref);
-	kfree(fence);
-}
-
-/**
  * radeon_fence_seq_signaled - check if a fence sequence number has signaled
  *
  * @rdev: radeon device pointer
@@ -318,6 +350,75 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
 	return false;
 }
 
+static bool radeon_fence_is_signaled(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+	unsigned ring = fence->ring;
+	u64 seq = fence->seq;
+
+	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+		return true;
+	}
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_fence_process(rdev, ring);
+		up_read(&rdev->exclusive_lock);
+
+		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
+			return true;
+		}
+	}
+	return false;
+}
+
+/**
+ * radeon_fence_enable_signaling - enable signalling on fence
+ * @fence: fence
+ *
+ * This function is called with fence_queue lock held, and adds a callback
+ * to fence_queue that checks if this fence is signaled, and if so it
+ * signals the fence and removes itself.
+ */
+static bool radeon_fence_enable_signaling(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+
+	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq)
+		return false;
+
+	if (down_read_trylock(&rdev->exclusive_lock)) {
+		radeon_irq_kms_sw_irq_get(rdev, fence->ring);
+
+		if (radeon_fence_activity(rdev, fence->ring))
+			wake_up_all_locked(&rdev->fence_queue);
+
+		/* did fence get signaled after we enabled the sw irq? */
+		if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
+			radeon_irq_kms_sw_irq_put(rdev, fence->ring);
+			up_read(&rdev->exclusive_lock);
+			return false;
+		}
+
+		up_read(&rdev->exclusive_lock);
+	} else {
+		/* we're probably in a lockup, lets not fiddle too much */
+		if (radeon_irq_kms_sw_irq_get_delayed(rdev, fence->ring))
+			rdev->fence_drv[fence->ring].delayed_irq = true;
+		radeon_fence_schedule_check(rdev, fence->ring);
+	}
+
+	fence->fence_wake.flags = 0;
+	fence->fence_wake.private = NULL;
+	fence->fence_wake.func = radeon_fence_check_signaled;
+	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
+	fence_get(f);
+
+	FENCE_TRACE(&fence->base, "armed on ring %i!\n", fence->ring);
+	return true;
+}
+
 /**
  * radeon_fence_signaled - check if a fence has signaled
  *
@@ -330,8 +431,15 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
 {
 	if (!fence)
 		return true;
-	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring))
+
+	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
+		int ret;
+
+		ret = fence_signal(&fence->base);
+		if (!ret)
+			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
 		return true;
+	}
 	return false;
 }
 
@@ -433,17 +541,15 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
 	uint64_t seq[RADEON_NUM_RINGS] = {};
 	long r;
 
-	if (fence == NULL) {
-		WARN(1, "Querying an invalid fence : %p !\n", fence);
-		return -EINVAL;
-	}
-
 	seq[fence->ring] = fence->seq;
 	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
 	if (r < 0) {
 		return r;
 	}
 
+	r = fence_signal(&fence->base);
+	if (!r)
+		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
 	return 0;
 }
 
@@ -557,7 +663,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
  */
 struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
 {
-	kref_get(&fence->kref);
+	fence_get(&fence->base);
 	return fence;
 }
 
@@ -574,7 +680,7 @@ void radeon_fence_unref(struct radeon_fence **fence)
 
 	*fence = NULL;
 	if (tmp) {
-		kref_put(&tmp->kref, radeon_fence_destroy);
+		fence_put(&tmp->base);
 	}
 }
 
@@ -887,3 +993,72 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
 	return 0;
 #endif
 }
+
+static const char *radeon_fence_get_driver_name(struct fence *fence)
+{
+	return "radeon";
+}
+
+static const char *radeon_fence_get_timeline_name(struct fence *f)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	switch (fence->ring) {
+	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
+	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
+	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
+	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
+	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
+	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
+	case TN_RING_TYPE_VCE1_INDEX: return "radeon.vce1";
+	case TN_RING_TYPE_VCE2_INDEX: return "radeon.vce2";
+	default: WARN_ON_ONCE(1); return "radeon.unk";
+	}
+}
+
+static inline bool radeon_test_signaled(struct radeon_fence *fence)
+{
+	return test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
+}
+
+static signed long radeon_fence_default_wait(struct fence *f, bool intr,
+					     signed long t)
+{
+	struct radeon_fence *fence = to_radeon_fence(f);
+	struct radeon_device *rdev = fence->rdev;
+	bool signaled;
+
+	fence_enable_sw_signaling(&fence->base);
+
+	/*
+	 * This function has to return -EDEADLK, but cannot hold
+	 * exclusive_lock during the wait because some callers
+	 * may already hold it. This means checking needs_reset without
+	 * lock, and not fiddling with any gpu internals.
+	 *
+	 * The callback installed with fence_enable_sw_signaling will
+	 * run before our wait_event_*timeout call, so we will see
+	 * both the signaled fence and the changes to needs_reset.
+	 */
+
+	if (intr)
+		t = wait_event_interruptible_timeout(rdev->fence_queue,
+			((signaled = radeon_test_signaled(fence)) ||
+			 rdev->needs_reset), t);
+	else
+		t = wait_event_timeout(rdev->fence_queue,
+			((signaled = radeon_test_signaled(fence)) ||
+			 rdev->needs_reset), t);
+
+	if (t > 0 && !signaled)
+		return -EDEADLK;
+	return t;
+}
+
+const struct fence_ops radeon_fence_ops = {
+	.get_driver_name = radeon_fence_get_driver_name,
+	.get_timeline_name = radeon_fence_get_timeline_name,
+	.enable_signaling = radeon_fence_enable_signaling,
+	.signaled = radeon_fence_is_signaled,
+	.wait = radeon_fence_default_wait,
+	.release = NULL,
+};
diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
index f0bff4be67f1..7784911d78ef 100644
--- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
+++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
@@ -324,6 +324,21 @@ void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring)
 }
 
 /**
+ * radeon_irq_kms_sw_irq_get_delayed - enable software interrupt
+ *
+ * @rdev: radeon device pointer
+ * @ring: ring whose interrupt you want to enable
+ *
+ * Enables the software interrupt for a specific ring (all asics).
+ * The software interrupt is generally used to signal a fence on
+ * a particular ring.
+ */
+bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int ring)
+{
+	return atomic_inc_return(&rdev->irq.ring_int[ring]) == 1;
+}
+
+/**
  * radeon_irq_kms_sw_irq_put - disable software interrupt
  *
  * @rdev: radeon device pointer

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PULL REQUEST] ttm fence conversion
  2014-09-02 12:29               ` Maarten Lankhorst
@ 2014-09-02 13:47                 ` Christian König
  0 siblings, 0 replies; 10+ messages in thread
From: Christian König @ 2014-09-02 13:47 UTC (permalink / raw)
  To: Maarten Lankhorst, Dave Airlie; +Cc: dri-devel

> How does this patch look?
Looks better now, yes. This patch is Reviewed-by: Christian König 
<christian.koenig@amd.com>

The next one we need to take a look at is "drm/radeon: use rcu waits in 
some ioctls":
> @@ -110,9 +110,12 @@ static int radeon_gem_set_domain(struct 
> drm_gem_object *gobj,
>         }
>         if (domain == RADEON_GEM_DOMAIN_CPU) {
>                 /* Asking for cpu access wait for object idle */
> -               r = radeon_bo_wait(robj, NULL, false);
> -               if (r) {
> -                       printk(KERN_ERR "Failed to wait for object !\n");
> +               r = 
> reservation_object_wait_timeout_rcu(robj->tbo.resv, true, true, 30 * HZ);

Here r is still an int, so this assignment might overflow.

Apart from that the patch has my rb as well.

Regards,
Christian.

Am 02.09.2014 um 14:29 schrieb Maarten Lankhorst:
> Op 02-09-14 om 11:52 schreef Christian König:
>> Am 02.09.2014 um 11:12 schrieb Maarten Lankhorst:
>>> Op 02-09-14 om 10:51 schreef Christian König:
>>>> Am 01.09.2014 um 20:43 schrieb Maarten Lankhorst:
>>>>> Hey,
>>>>>
>>>>> On 01-09-14 18:21, Christian König wrote:
>>>>>> Am 01.09.2014 um 15:33 schrieb Maarten Lankhorst:
>>>>>>> Hey,
>>>>>>>
>>>>>>> Op 01-09-14 om 14:31 schreef Christian König:
>>>>>>>> Please wait a second with that.
>>>>>>>>
>>>>>>>> I didn't had a chance to test this yet and nobody has yet given it's rb on at least the radeon changes in this branch.
>>>>>>> Ok, my fault. I thought it was implicitly acked. I haven't made any functional changes to these patches,
>>>>>>> just some small fixups and a fix to make it apply after the upstream removal of  RADEON_FENCE_SIGNALED_SEQ.
>>>>>> Yeah, but the resulting patch looks to complex for my taste and should be simplified a bit more. Here is a more detailed review:
>>>>>>
>>>>>>> +    wait_queue_t fence_wake;
>>>>>> Only a nitpick, but please fix the indention and maybe add a comment.
>>>>>>
>>>>>>> +    struct work_struct delayed_irq_work;
>>>>>> Just drop that, the new fall back work item should take care of this when the unfortunate case happens that somebody tries to enable_signaling in the middle of a GPU reset.
>>>>> I can only drop it if radeon_gpu_reset will always call radeon_irq_set after downgrading to read mode, even if no work needs to be done. :-)
>>>>>
>>>>> Then again, should be possible.
>>>> The fall back handler should take care of the rare condition that we can't activate the IRQ because the driver is in a lockup handler.
>>>>
>>>> The issue is that the delayed_irq_work handler needs to take the exclusive lock once more and so would block an innocent process for the duration of the GPU lockup.
>>>>
>>>> Either reschedule as delayed work item if we can't take the lock immediately or just live with the delay of the fall back handler. Since IRQs usually don't work correctly immediately after an GPU reset I'm pretty sure that the fallback handler will be needed anyway.
>>> Ok, rescheduling would be fine. Or could I go with the alternative, remove the delayed_irq_work and always set irqs after downgrading the write lock?
>> Always setting the IRQ's after downgrading the write lock would work for me as well.
>>
>>
>>>>>>>     /*
>>>>>>> - * Cast helper
>>>>>>> - */
>>>>>>> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
>>>>>>> -
>>>>>>> -/*
>>>>>> Please define the new cast helper in radeon.h as well.
>>>>> The ops are only defined in radeon_fence.c, and nothing outside of radeon_fence.c should care about the internals.
>>>> Then define this as a function instead, I need a checked cast from a fence to a radeon_fence outside of the fence code as well.
>>> Ok.
>>>
>>>>>>>         if (!rdev->needs_reset) {
>>>>>>> -        up_write(&rdev->exclusive_lock);
>>>>>>> +        downgrade_write(&rdev->exclusive_lock);
>>>>>>> +        wake_up_all(&rdev->fence_queue);
>>>>>>> +        up_read(&rdev->exclusive_lock);
>>>>>>>             return 0;
>>>>>>>         }
>>>>>> Just drop that as well, no need to wake up anybody here.
>>>>> Maybe not, but if I have to remove delayed_irq_work I do need to add a radeon_irq_set here.
>>>>>
>>>>>>>     downgrade_write(&rdev->exclusive_lock);
>>>>>>> +    wake_up_all(&rdev->fence_queue);
>>>>>> Same here, the IB test will wake up all fences for recheck anyway.
>>>>> Same as previous comment. :-)
>>>>>
>>>>>>> + * radeon_fence_read_seq - Returns the current fence value without updating
>>>>>>> + *
>>>>>>> + * @rdev: radeon_device pointer
>>>>>>> + * @ring: ring index to return the seqno of
>>>>>>> + */
>>>>>>> +static uint64_t radeon_fence_read_seq(struct radeon_device *rdev, int ring)
>>>>>>> +{
>>>>>>> +    uint64_t last_seq = atomic64_read(&rdev->fence_drv[ring].last_seq);
>>>>>>> +    uint64_t last_emitted = rdev->fence_drv[ring].sync_seq[ring];
>>>>>>> +    uint64_t seq = radeon_fence_read(rdev, ring);
>>>>>>> +
>>>>>>> +    seq = radeon_fence_read(rdev, ring);
>>>>>>> +    seq |= last_seq & 0xffffffff00000000LL;
>>>>>>> +    if (seq < last_seq) {
>>>>>>> +        seq &= 0xffffffff;
>>>>>>> +        seq |= last_emitted & 0xffffffff00000000LL;
>>>>>>> +    }
>>>>>>> +    return seq;
>>>>>>> +}
>>>>>> Completely drop that and just check the last_seq signaled as set by radeon_fence_activity.
>>>>> Do you mean call radeon_fence_activity in radeon_fence_signaled? Or should I just use the cached value in radeon_fence_check_signaled.
>>>> Just check the cached value, it should be updated by radeon_fence_activity immediately before calling this anyway.
>>> Ok. I think I wrote this as a workaround for unreliable interrupts. :-)
>>>
>>>>> I can't call fence_activity in check_signaled, because it would cause re-entrancy in fence_queue.
>>>>>
>>>>>>> +        if (!ret)
>>>>>>> +            FENCE_TRACE(&fence->base, "signaled from irq context\n");
>>>>>>> +        else
>>>>>>> +            FENCE_TRACE(&fence->base, "was already signaled\n");
>>>>>> Is all that text tracing necessary? Probably better define a trace point here.
>>>>> It gets optimized out normally. There's already a tracepoint called in fence_signal.
>>>>>   
>>>>>>> +    if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq ||
>>>>>>> +        !rdev->ddev->irq_enabled)
>>>>>>> +        return false;
>>>>>> Checking irq_enabled here might not be such a good idea if the fence code don't has a fall back on it's own. What exactly happens if enable_signaling returns false?
>>>>> I thought irq_enabled couldn't happen under normal circumstances?
>>>> Not 100% sure, but I think it is temporary turned off during reset.
>>>>
>>>>> Anyway the fence gets treated as signaled if it returns false, and fence_signal will get called.
>>>> Thought so, well that's rather bad if we failed to install the IRQ handle that we just treat all fences as signaled isn't it?
>>> I wrote this code before the delayed work was added, I guess the check for !irq_enabled can be removed now. :-)
>>>
>>>>>>> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
>>>>>>> +                         signed long timeout)
>>>>>>> +{
>>>>>>> +    struct radeon_fence *fence = to_radeon_fence(f);
>>>>>>> +    struct radeon_device *rdev = fence->rdev;
>>>>>>> +    bool signaled;
>>>>>>> +
>>>>>>> +    fence_enable_sw_signaling(&fence->base);
>>>>>>> +
>>>>>>> +    /*
>>>>>>> +     * This function has to return -EDEADLK, but cannot hold
>>>>>>> +     * exclusive_lock during the wait because some callers
>>>>>>> +     * may already hold it. This means checking needs_reset without
>>>>>>> +     * lock, and not fiddling with any gpu internals.
>>>>>>> +     *
>>>>>>> +     * The callback installed with fence_enable_sw_signaling will
>>>>>>> +     * run before our wait_event_*timeout call, so we will see
>>>>>>> +     * both the signaled fence and the changes to needs_reset.
>>>>>>> +     */
>>>>>>> +
>>>>>>> +    if (intr)
>>>>>>> +        timeout = wait_event_interruptible_timeout(rdev->fence_queue,
>>>>>>> +                               ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>>> +                               timeout);
>>>>>>> +    else
>>>>>>> +        timeout = wait_event_timeout(rdev->fence_queue,
>>>>>>> +                         ((signaled = (test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags))) || rdev->needs_reset),
>>>>>>> +                         timeout);
>>>>>>> +
>>>>>>> +    if (timeout > 0 && !signaled)
>>>>>>> +        return -EDEADLK;
>>>>>>> +    return timeout;
>>>>>>> +}
>>>>>> This at least needs to be properly formated, but I think since we now don't need extra handling any more we don't need an extra wait function as well.
>>>>> I thought of removing the extra handling, but the -EDEADLK stuff is needed because a deadlock could happen in ttm_bo_lock_delayed_workqueue otherwise if the gpu's really hung there would never be any progress forward.
>>>> Hui what? ttm_bo_lock_delayed_workqueue shouldn't call any blocking wait.
>>> Oops, you're right. ttm_bo_delayed_delete is called with remove_all false, not true.
>>>
>>> Unfortunately ttm_bo_vm_fault does hold the exclusive_lock in read mode, and other places that use eviction will use it too.
>>> Without returning -EDEADLK this could mean that ttm_bo_move_accel_cleanup would block forever,
>>> so this function has to stay.
>> Ok, fine with me. I'm not deep enough into the TTM code to really judge this, but my understanding was that TTM still calls it's own wait callback.
> That one is about to go away. :-)
>
> How does this patch look?
> ---- 8< ----
> commit e75af5ee3b94157e868cb48b4bfbc1ca36183ba4
> Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Date:   Thu Jan 9 11:03:12 2014 +0100
>
>      drm/radeon: use common fence implementation for fences, v4
>      
>      Changes since v1:
>      - Kill the sw interrupt dance, add and use
>        radeon_irq_kms_sw_irq_get_delayed instead.
>      - Change custom wait function, lockdep complained about it.
>        Holding exclusive_lock in the wait function might cause deadlocks.
>        Instead do all the processing in .enable_signaling, and wait
>        on the global fence_queue to pick up gpu resets.
>      - Process all fences in radeon_gpu_reset after reset to close a race
>        with the trylock in enable_signaling.
>      Changes since v2:
>      - Small changes to work with the rewritten lockup recovery patches.
>      Changes since v3:
>      - Call radeon_fence_schedule_check when exclusive_lock cannot be
>        acquired to always cause a wake up.
>      - Reset irqs from hangup check.
>      - Drop reading seqno in the callback, use cached value.
>      - Fix indentation in radeon_fence_default_wait
>      - Add a radeon_test_signaled function, drop a few test_bit calls.
>      - Make to_radeon_fence global.
>      
>      Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
>
> diff --git a/drivers/gpu/drm/radeon/radeon.h b/drivers/gpu/drm/radeon/radeon.h
> index 83a24614138a..d80dc547a105 100644
> --- a/drivers/gpu/drm/radeon/radeon.h
> +++ b/drivers/gpu/drm/radeon/radeon.h
> @@ -66,6 +66,7 @@
>   #include <linux/kref.h>
>   #include <linux/interval_tree.h>
>   #include <linux/hashtable.h>
> +#include <linux/fence.h>
>   
>   #include <ttm/ttm_bo_api.h>
>   #include <ttm/ttm_bo_driver.h>
> @@ -354,17 +355,19 @@ struct radeon_fence_driver {
>   	/* sync_seq is protected by ring emission lock */
>   	uint64_t			sync_seq[RADEON_NUM_RINGS];
>   	atomic64_t			last_seq;
> -	bool				initialized;
> +	bool				initialized, delayed_irq;
>   	struct delayed_work		lockup_work;
>   };
>   
>   struct radeon_fence {
> +	struct fence base;
> +
>   	struct radeon_device		*rdev;
> -	struct kref			kref;
> -	/* protected by radeon_fence.lock */
>   	uint64_t			seq;
>   	/* RB, DMA, etc. */
>   	unsigned			ring;
> +
> +	wait_queue_t			fence_wake;
>   };
>   
>   int radeon_fence_driver_start_ring(struct radeon_device *rdev, int ring);
> @@ -782,6 +785,7 @@ struct radeon_irq {
>   int radeon_irq_kms_init(struct radeon_device *rdev);
>   void radeon_irq_kms_fini(struct radeon_device *rdev);
>   void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring);
> +bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int ring);
>   void radeon_irq_kms_sw_irq_put(struct radeon_device *rdev, int ring);
>   void radeon_irq_kms_pflip_irq_get(struct radeon_device *rdev, int crtc);
>   void radeon_irq_kms_pflip_irq_put(struct radeon_device *rdev, int crtc);
> @@ -2308,6 +2312,7 @@ struct radeon_device {
>   	struct radeon_mman		mman;
>   	struct radeon_fence_driver	fence_drv[RADEON_NUM_RINGS];
>   	wait_queue_head_t		fence_queue;
> +	unsigned			fence_context;
>   	struct mutex			ring_lock;
>   	struct radeon_ring		ring[RADEON_NUM_RINGS];
>   	bool				ib_pool_ready;
> @@ -2441,7 +2446,17 @@ void cik_mm_wdoorbell(struct radeon_device *rdev, u32 index, u32 v);
>   /*
>    * Cast helper
>    */
> -#define to_radeon_fence(p) ((struct radeon_fence *)(p))
> +extern const struct fence_ops radeon_fence_ops;
> +
> +static inline struct radeon_fence *to_radeon_fence(struct fence *f)
> +{
> +	struct radeon_fence *__f = container_of(f, struct radeon_fence, base);
> +
> +	if (__f->base.ops == &radeon_fence_ops)
> +		return __f;
> +
> +	return NULL;
> +}
>   
>   /*
>    * Registers read & write functions.
> diff --git a/drivers/gpu/drm/radeon/radeon_device.c b/drivers/gpu/drm/radeon/radeon_device.c
> index d30f1cc1aa12..e84a76e6656a 100644
> --- a/drivers/gpu/drm/radeon/radeon_device.c
> +++ b/drivers/gpu/drm/radeon/radeon_device.c
> @@ -1253,6 +1253,7 @@ int radeon_device_init(struct radeon_device *rdev,
>   	for (i = 0; i < RADEON_NUM_RINGS; i++) {
>   		rdev->ring[i].idx = i;
>   	}
> +	rdev->fence_context = fence_context_alloc(RADEON_NUM_RINGS);
>   
>   	DRM_INFO("initializing kernel modesetting (%s 0x%04X:0x%04X 0x%04X:0x%04X).\n",
>   		radeon_family_name[rdev->family], pdev->vendor, pdev->device,
> diff --git a/drivers/gpu/drm/radeon/radeon_fence.c b/drivers/gpu/drm/radeon/radeon_fence.c
> index ecdba3afa2c3..af9f2d6bd7d0 100644
> --- a/drivers/gpu/drm/radeon/radeon_fence.c
> +++ b/drivers/gpu/drm/radeon/radeon_fence.c
> @@ -130,15 +130,18 @@ int radeon_fence_emit(struct radeon_device *rdev,
>   		      struct radeon_fence **fence,
>   		      int ring)
>   {
> +	u64 seq = ++rdev->fence_drv[ring].sync_seq[ring];
> +
>   	/* we are protected by the ring emission mutex */
>   	*fence = kmalloc(sizeof(struct radeon_fence), GFP_KERNEL);
>   	if ((*fence) == NULL) {
>   		return -ENOMEM;
>   	}
> -	kref_init(&((*fence)->kref));
>   	(*fence)->rdev = rdev;
> -	(*fence)->seq = ++rdev->fence_drv[ring].sync_seq[ring];
> +	(*fence)->seq = seq;
>   	(*fence)->ring = ring;
> +	fence_init(&(*fence)->base, &radeon_fence_ops,
> +		   &rdev->fence_queue.lock, rdev->fence_context + ring, seq);
>   	radeon_fence_ring_emit(rdev, ring, *fence);
>   	trace_radeon_fence_emit(rdev->ddev, ring, (*fence)->seq);
>   	radeon_fence_schedule_check(rdev, ring);
> @@ -146,6 +149,41 @@ int radeon_fence_emit(struct radeon_device *rdev,
>   }
>   
>   /**
> + * radeon_fence_check_signaled - callback from fence_queue
> + *
> + * this function is called with fence_queue lock held, which is also used
> + * for the fence locking itself, so unlocked variants are used for
> + * fence_signal, and remove_wait_queue.
> + */
> +static int radeon_fence_check_signaled(wait_queue_t *wait, unsigned mode, int flags, void *key)
> +{
> +	struct radeon_fence *fence;
> +	u64 seq;
> +
> +	fence = container_of(wait, struct radeon_fence, fence_wake);
> +
> +	/*
> +	 * We cannot use radeon_fence_process here because we're already
> +	 * in the waitqueue, in a call from wake_up_all.
> +	 */
> +	seq = atomic64_read(&fence->rdev->fence_drv[fence->ring].last_seq);
> +	if (seq >= fence->seq) {
> +		int ret = fence_signal_locked(&fence->base);
> +
> +		if (!ret)
> +			FENCE_TRACE(&fence->base, "signaled from irq context\n");
> +		else
> +			FENCE_TRACE(&fence->base, "was already signaled\n");
> +
> +		radeon_irq_kms_sw_irq_put(fence->rdev, fence->ring);
> +		__remove_wait_queue(&fence->rdev->fence_queue, &fence->fence_wake);
> +		fence_put(&fence->base);
> +	} else
> +		FENCE_TRACE(&fence->base, "pending\n");
> +	return 0;
> +}
> +
> +/**
>    * radeon_fence_activity - check for fence activity
>    *
>    * @rdev: radeon_device pointer
> @@ -242,6 +280,15 @@ static void radeon_fence_check_lockup(struct work_struct *work)
>   		return;
>   	}
>   
> +	if (fence_drv->delayed_irq && rdev->ddev->irq_enabled) {
> +		unsigned long irqflags;
> +
> +		fence_drv->delayed_irq = false;
> +		spin_lock_irqsave(&rdev->irq.lock, irqflags);
> +		radeon_irq_set(rdev);
> +		spin_unlock_irqrestore(&rdev->irq.lock, irqflags);
> +	}
> +
>   	if (radeon_fence_activity(rdev, ring))
>   		wake_up_all(&rdev->fence_queue);
>   
> @@ -276,21 +323,6 @@ void radeon_fence_process(struct radeon_device *rdev, int ring)
>   }
>   
>   /**
> - * radeon_fence_destroy - destroy a fence
> - *
> - * @kref: fence kref
> - *
> - * Frees the fence object (all asics).
> - */
> -static void radeon_fence_destroy(struct kref *kref)
> -{
> -	struct radeon_fence *fence;
> -
> -	fence = container_of(kref, struct radeon_fence, kref);
> -	kfree(fence);
> -}
> -
> -/**
>    * radeon_fence_seq_signaled - check if a fence sequence number has signaled
>    *
>    * @rdev: radeon device pointer
> @@ -318,6 +350,75 @@ static bool radeon_fence_seq_signaled(struct radeon_device *rdev,
>   	return false;
>   }
>   
> +static bool radeon_fence_is_signaled(struct fence *f)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +	struct radeon_device *rdev = fence->rdev;
> +	unsigned ring = fence->ring;
> +	u64 seq = fence->seq;
> +
> +	if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> +		return true;
> +	}
> +
> +	if (down_read_trylock(&rdev->exclusive_lock)) {
> +		radeon_fence_process(rdev, ring);
> +		up_read(&rdev->exclusive_lock);
> +
> +		if (atomic64_read(&rdev->fence_drv[ring].last_seq) >= seq) {
> +			return true;
> +		}
> +	}
> +	return false;
> +}
> +
> +/**
> + * radeon_fence_enable_signaling - enable signalling on fence
> + * @fence: fence
> + *
> + * This function is called with fence_queue lock held, and adds a callback
> + * to fence_queue that checks if this fence is signaled, and if so it
> + * signals the fence and removes itself.
> + */
> +static bool radeon_fence_enable_signaling(struct fence *f)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +	struct radeon_device *rdev = fence->rdev;
> +
> +	if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq)
> +		return false;
> +
> +	if (down_read_trylock(&rdev->exclusive_lock)) {
> +		radeon_irq_kms_sw_irq_get(rdev, fence->ring);
> +
> +		if (radeon_fence_activity(rdev, fence->ring))
> +			wake_up_all_locked(&rdev->fence_queue);
> +
> +		/* did fence get signaled after we enabled the sw irq? */
> +		if (atomic64_read(&rdev->fence_drv[fence->ring].last_seq) >= fence->seq) {
> +			radeon_irq_kms_sw_irq_put(rdev, fence->ring);
> +			up_read(&rdev->exclusive_lock);
> +			return false;
> +		}
> +
> +		up_read(&rdev->exclusive_lock);
> +	} else {
> +		/* we're probably in a lockup, lets not fiddle too much */
> +		if (radeon_irq_kms_sw_irq_get_delayed(rdev, fence->ring))
> +			rdev->fence_drv[fence->ring].delayed_irq = true;
> +		radeon_fence_schedule_check(rdev, fence->ring);
> +	}
> +
> +	fence->fence_wake.flags = 0;
> +	fence->fence_wake.private = NULL;
> +	fence->fence_wake.func = radeon_fence_check_signaled;
> +	__add_wait_queue(&rdev->fence_queue, &fence->fence_wake);
> +	fence_get(f);
> +
> +	FENCE_TRACE(&fence->base, "armed on ring %i!\n", fence->ring);
> +	return true;
> +}
> +
>   /**
>    * radeon_fence_signaled - check if a fence has signaled
>    *
> @@ -330,8 +431,15 @@ bool radeon_fence_signaled(struct radeon_fence *fence)
>   {
>   	if (!fence)
>   		return true;
> -	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring))
> +
> +	if (radeon_fence_seq_signaled(fence->rdev, fence->seq, fence->ring)) {
> +		int ret;
> +
> +		ret = fence_signal(&fence->base);
> +		if (!ret)
> +			FENCE_TRACE(&fence->base, "signaled from radeon_fence_signaled\n");
>   		return true;
> +	}
>   	return false;
>   }
>   
> @@ -433,17 +541,15 @@ int radeon_fence_wait(struct radeon_fence *fence, bool intr)
>   	uint64_t seq[RADEON_NUM_RINGS] = {};
>   	long r;
>   
> -	if (fence == NULL) {
> -		WARN(1, "Querying an invalid fence : %p !\n", fence);
> -		return -EINVAL;
> -	}
> -
>   	seq[fence->ring] = fence->seq;
>   	r = radeon_fence_wait_seq_timeout(fence->rdev, seq, intr, MAX_SCHEDULE_TIMEOUT);
>   	if (r < 0) {
>   		return r;
>   	}
>   
> +	r = fence_signal(&fence->base);
> +	if (!r)
> +		FENCE_TRACE(&fence->base, "signaled from fence_wait\n");
>   	return 0;
>   }
>   
> @@ -557,7 +663,7 @@ int radeon_fence_wait_empty(struct radeon_device *rdev, int ring)
>    */
>   struct radeon_fence *radeon_fence_ref(struct radeon_fence *fence)
>   {
> -	kref_get(&fence->kref);
> +	fence_get(&fence->base);
>   	return fence;
>   }
>   
> @@ -574,7 +680,7 @@ void radeon_fence_unref(struct radeon_fence **fence)
>   
>   	*fence = NULL;
>   	if (tmp) {
> -		kref_put(&tmp->kref, radeon_fence_destroy);
> +		fence_put(&tmp->base);
>   	}
>   }
>   
> @@ -887,3 +993,72 @@ int radeon_debugfs_fence_init(struct radeon_device *rdev)
>   	return 0;
>   #endif
>   }
> +
> +static const char *radeon_fence_get_driver_name(struct fence *fence)
> +{
> +	return "radeon";
> +}
> +
> +static const char *radeon_fence_get_timeline_name(struct fence *f)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +	switch (fence->ring) {
> +	case RADEON_RING_TYPE_GFX_INDEX: return "radeon.gfx";
> +	case CAYMAN_RING_TYPE_CP1_INDEX: return "radeon.cp1";
> +	case CAYMAN_RING_TYPE_CP2_INDEX: return "radeon.cp2";
> +	case R600_RING_TYPE_DMA_INDEX: return "radeon.dma";
> +	case CAYMAN_RING_TYPE_DMA1_INDEX: return "radeon.dma1";
> +	case R600_RING_TYPE_UVD_INDEX: return "radeon.uvd";
> +	case TN_RING_TYPE_VCE1_INDEX: return "radeon.vce1";
> +	case TN_RING_TYPE_VCE2_INDEX: return "radeon.vce2";
> +	default: WARN_ON_ONCE(1); return "radeon.unk";
> +	}
> +}
> +
> +static inline bool radeon_test_signaled(struct radeon_fence *fence)
> +{
> +	return test_bit(FENCE_FLAG_SIGNALED_BIT, &fence->base.flags);
> +}
> +
> +static signed long radeon_fence_default_wait(struct fence *f, bool intr,
> +					     signed long t)
> +{
> +	struct radeon_fence *fence = to_radeon_fence(f);
> +	struct radeon_device *rdev = fence->rdev;
> +	bool signaled;
> +
> +	fence_enable_sw_signaling(&fence->base);
> +
> +	/*
> +	 * This function has to return -EDEADLK, but cannot hold
> +	 * exclusive_lock during the wait because some callers
> +	 * may already hold it. This means checking needs_reset without
> +	 * lock, and not fiddling with any gpu internals.
> +	 *
> +	 * The callback installed with fence_enable_sw_signaling will
> +	 * run before our wait_event_*timeout call, so we will see
> +	 * both the signaled fence and the changes to needs_reset.
> +	 */
> +
> +	if (intr)
> +		t = wait_event_interruptible_timeout(rdev->fence_queue,
> +			((signaled = radeon_test_signaled(fence)) ||
> +			 rdev->needs_reset), t);
> +	else
> +		t = wait_event_timeout(rdev->fence_queue,
> +			((signaled = radeon_test_signaled(fence)) ||
> +			 rdev->needs_reset), t);
> +
> +	if (t > 0 && !signaled)
> +		return -EDEADLK;
> +	return t;
> +}
> +
> +const struct fence_ops radeon_fence_ops = {
> +	.get_driver_name = radeon_fence_get_driver_name,
> +	.get_timeline_name = radeon_fence_get_timeline_name,
> +	.enable_signaling = radeon_fence_enable_signaling,
> +	.signaled = radeon_fence_is_signaled,
> +	.wait = radeon_fence_default_wait,
> +	.release = NULL,
> +};
> diff --git a/drivers/gpu/drm/radeon/radeon_irq_kms.c b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> index f0bff4be67f1..7784911d78ef 100644
> --- a/drivers/gpu/drm/radeon/radeon_irq_kms.c
> +++ b/drivers/gpu/drm/radeon/radeon_irq_kms.c
> @@ -324,6 +324,21 @@ void radeon_irq_kms_sw_irq_get(struct radeon_device *rdev, int ring)
>   }
>   
>   /**
> + * radeon_irq_kms_sw_irq_get_delayed - enable software interrupt
> + *
> + * @rdev: radeon device pointer
> + * @ring: ring whose interrupt you want to enable
> + *
> + * Enables the software interrupt for a specific ring (all asics).
> + * The software interrupt is generally used to signal a fence on
> + * a particular ring.
> + */
> +bool radeon_irq_kms_sw_irq_get_delayed(struct radeon_device *rdev, int ring)
> +{
> +	return atomic_inc_return(&rdev->irq.ring_int[ring]) == 1;
> +}
> +
> +/**
>    * radeon_irq_kms_sw_irq_put - disable software interrupt
>    *
>    * @rdev: radeon device pointer
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2014-09-02 13:48 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-09-01 11:34 [PULL REQUEST] ttm fence conversion Maarten Lankhorst
2014-09-01 12:31 ` Christian König
2014-09-01 13:33   ` Maarten Lankhorst
2014-09-01 16:21     ` Christian König
2014-09-01 18:43       ` Maarten Lankhorst
2014-09-02  8:51         ` Christian König
2014-09-02  9:12           ` Maarten Lankhorst
2014-09-02  9:52             ` Christian König
2014-09-02 12:29               ` Maarten Lankhorst
2014-09-02 13:47                 ` Christian König

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.