All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matthew Auld <matthew.auld@intel.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Tvrtko Ursulin" <tvrtko.ursulin@linux.intel.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	Matthew Auld <matthew.william.auld@gmail.com>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 7/9] drm/i915: stop using ttm_bo_wait
Date: Tue, 6 Dec 2022 18:03:02 +0000	[thread overview]
Message-ID: <d56a0149-2913-8b78-de91-f633ae664a7a@intel.com> (raw)
In-Reply-To: <4514ca57-e39e-d684-3101-fddf57b0c89a@gmail.com>

On 05/12/2022 19:58, Christian König wrote:
> Am 30.11.22 um 15:06 schrieb Daniel Vetter:
>> On Wed, 30 Nov 2022 at 14:03, Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> On 29/11/2022 18:05, Matthew Auld wrote:
>>>> On Fri, 25 Nov 2022 at 11:14, Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>
>>>>> + Matt
>>>>>
>>>>> On 25/11/2022 10:21, Christian König wrote:
>>>>>> TTM is just wrapping core DMA functionality here, remove the 
>>>>>> mid-layer.
>>>>>> No functional change.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++++++---
>>>>>>     1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>>> index 5247d88b3c13..d409a77449a3 100644
>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>>> @@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct 
>>>>>> drm_i915_gem_object *obj,
>>>>>>     static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
>>>>>>     {
>>>>>>         struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>>>> -     int err;
>>>>>> +     long err;
>>>>>>
>>>>>>         WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
>>>>>>
>>>>>> -     err = ttm_bo_wait(bo, true, false);
>>>>>> -     if (err)
>>>>>> +     err = dma_resv_wait_timeout(bo->base.resv, 
>>>>>> DMA_RESV_USAGE_BOOKKEEP,
>>>>>> +                                 true, 15 * HZ);
>>>>> This 15 second stuck out a bit for me and then on a slightly deeper 
>>>>> look
>>>>> it seems this timeout will "leak" into a few of i915 code paths. If we
>>>>> look at the difference between the legacy shmem and ttm backend I 
>>>>> am not
>>>>> sure if the legacy one is blocking or not - but if it can block I 
>>>>> don't
>>>>> think it would have an arbitrary timeout like this. Matt your 
>>>>> thoughts?
>>>> Not sure what is meant by leak here, but the legacy shmem must also
>>>> wait/block when unbinding each VMA, before calling truncate. It's the
>>> By "leak" I meant if 15s timeout propagates into some code paths visible
>>> from userspace which with a legacy backend instead have an indefinite
>>> wait. If we have that it's probably not very good to have this
>>> inconsistency, or to apply an arbitrary timeout to those path to 
>>> start with.
>>>
>>>> same story for the ttm backend, except slightly more complicated in
>>>> that there might be no currently bound VMA, and yet the GPU could
>>>> still be accessing the pages due to async unbinds, kernel moves etc,
>>>> which the wait here (and in i915_ttm_shrink) is meant to protect
>>>> against. If the wait times out it should just fail gracefully. I guess
>>>> we could just use MAX_SCHEDULE_TIMEOUT here? Not sure if it really
>>>> matters though.
>>> Right, depends if it can leak or not to userspace and diverge between
>>> backends.
>> Generally lock_timeout() is a design bug. It's either
>> lock_interruptible (or maybe lock_killable) or try_lock, but
>> lock_timeout is just duct-tape. I haven't dug in to figure out what
>> should be here, but it smells fishy.
> 
> Independent of this discussion could I get an rb for removing 
> ttm_bo_wait() from i915?
> 
> Exactly hiding this timeout inside TTM is what always made me quite 
> nervous here.

There are also a few places in i915 calling bo_wait_ctx(), which appears 
to just wrap ttm_bo_wait(). I guess that should also be converted to 
dma_resv_wait_timeout()? Or what is the story with that?

Anyway,
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> 
> Regards,
> Christian.
> 
>> -Daniel
> 

WARNING: multiple messages have this Message-ID (diff)
From: Matthew Auld <matthew.auld@intel.com>
To: "Christian König" <ckoenig.leichtzumerken@gmail.com>,
	"Daniel Vetter" <daniel@ffwll.ch>,
	"Tvrtko Ursulin" <tvrtko.ursulin@linux.intel.com>
Cc: Intel Graphics Development <intel-gfx@lists.freedesktop.org>,
	dri-devel@lists.freedesktop.org, amd-gfx@lists.freedesktop.org
Subject: Re: [Intel-gfx] [PATCH 7/9] drm/i915: stop using ttm_bo_wait
Date: Tue, 6 Dec 2022 18:03:02 +0000	[thread overview]
Message-ID: <d56a0149-2913-8b78-de91-f633ae664a7a@intel.com> (raw)
In-Reply-To: <4514ca57-e39e-d684-3101-fddf57b0c89a@gmail.com>

On 05/12/2022 19:58, Christian König wrote:
> Am 30.11.22 um 15:06 schrieb Daniel Vetter:
>> On Wed, 30 Nov 2022 at 14:03, Tvrtko Ursulin
>> <tvrtko.ursulin@linux.intel.com> wrote:
>>> On 29/11/2022 18:05, Matthew Auld wrote:
>>>> On Fri, 25 Nov 2022 at 11:14, Tvrtko Ursulin
>>>> <tvrtko.ursulin@linux.intel.com> wrote:
>>>>>
>>>>> + Matt
>>>>>
>>>>> On 25/11/2022 10:21, Christian König wrote:
>>>>>> TTM is just wrapping core DMA functionality here, remove the 
>>>>>> mid-layer.
>>>>>> No functional change.
>>>>>>
>>>>>> Signed-off-by: Christian König <christian.koenig@amd.com>
>>>>>> ---
>>>>>>     drivers/gpu/drm/i915/gem/i915_gem_ttm.c | 9 ++++++---
>>>>>>     1 file changed, 6 insertions(+), 3 deletions(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c 
>>>>>> b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>>> index 5247d88b3c13..d409a77449a3 100644
>>>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_ttm.c
>>>>>> @@ -599,13 +599,16 @@ i915_ttm_resource_get_st(struct 
>>>>>> drm_i915_gem_object *obj,
>>>>>>     static int i915_ttm_truncate(struct drm_i915_gem_object *obj)
>>>>>>     {
>>>>>>         struct ttm_buffer_object *bo = i915_gem_to_ttm(obj);
>>>>>> -     int err;
>>>>>> +     long err;
>>>>>>
>>>>>>         WARN_ON_ONCE(obj->mm.madv == I915_MADV_WILLNEED);
>>>>>>
>>>>>> -     err = ttm_bo_wait(bo, true, false);
>>>>>> -     if (err)
>>>>>> +     err = dma_resv_wait_timeout(bo->base.resv, 
>>>>>> DMA_RESV_USAGE_BOOKKEEP,
>>>>>> +                                 true, 15 * HZ);
>>>>> This 15 second stuck out a bit for me and then on a slightly deeper 
>>>>> look
>>>>> it seems this timeout will "leak" into a few of i915 code paths. If we
>>>>> look at the difference between the legacy shmem and ttm backend I 
>>>>> am not
>>>>> sure if the legacy one is blocking or not - but if it can block I 
>>>>> don't
>>>>> think it would have an arbitrary timeout like this. Matt your 
>>>>> thoughts?
>>>> Not sure what is meant by leak here, but the legacy shmem must also
>>>> wait/block when unbinding each VMA, before calling truncate. It's the
>>> By "leak" I meant if 15s timeout propagates into some code paths visible
>>> from userspace which with a legacy backend instead have an indefinite
>>> wait. If we have that it's probably not very good to have this
>>> inconsistency, or to apply an arbitrary timeout to those path to 
>>> start with.
>>>
>>>> same story for the ttm backend, except slightly more complicated in
>>>> that there might be no currently bound VMA, and yet the GPU could
>>>> still be accessing the pages due to async unbinds, kernel moves etc,
>>>> which the wait here (and in i915_ttm_shrink) is meant to protect
>>>> against. If the wait times out it should just fail gracefully. I guess
>>>> we could just use MAX_SCHEDULE_TIMEOUT here? Not sure if it really
>>>> matters though.
>>> Right, depends if it can leak or not to userspace and diverge between
>>> backends.
>> Generally lock_timeout() is a design bug. It's either
>> lock_interruptible (or maybe lock_killable) or try_lock, but
>> lock_timeout is just duct-tape. I haven't dug in to figure out what
>> should be here, but it smells fishy.
> 
> Independent of this discussion could I get an rb for removing 
> ttm_bo_wait() from i915?
> 
> Exactly hiding this timeout inside TTM is what always made me quite 
> nervous here.

There are also a few places in i915 calling bo_wait_ctx(), which appears 
to just wrap ttm_bo_wait(). I guess that should also be converted to 
dma_resv_wait_timeout()? Or what is the story with that?

Anyway,
Reviewed-by: Matthew Auld <matthew.auld@intel.com>

> 
> Regards,
> Christian.
> 
>> -Daniel
> 

  reply	other threads:[~2022-12-06 18:03 UTC|newest]

Thread overview: 75+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-25 10:21 [PATCH 1/9] drm/amdgpu: generally allow over-commit during BO allocation Christian König
2022-11-25 10:21 ` [Intel-gfx] " Christian König
2022-11-25 10:21 ` [PATCH 2/9] drm/ttm: remove ttm_bo_(un)lock_delayed_workqueue Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 10:21 ` [PATCH 3/9] drm/ttm: use per BO cleanup workers Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-29 21:14   ` Felix Kuehling
2022-11-29 21:14     ` [Intel-gfx] " Felix Kuehling
2022-12-05 13:39     ` Christian König
2022-12-05 13:39       ` [Intel-gfx] " Christian König
2023-06-13 13:05       ` Karol Herbst
2023-06-13 13:05         ` [Intel-gfx] " Karol Herbst
2023-06-13 13:59         ` Christian König
2023-06-13 13:59           ` [Intel-gfx] " Christian König
2023-06-13 14:18           ` Karol Herbst
2023-06-13 14:18             ` [Intel-gfx] " Karol Herbst
2023-06-15 11:19             ` Christian König
2023-06-15 11:19               ` [Intel-gfx] " Christian König
2023-06-15 12:04               ` Karol Herbst
2023-06-15 12:04                 ` [Intel-gfx] " Karol Herbst
2022-11-25 10:21 ` [PATCH 4/9] drm/ttm: merge ttm_bo_api.h and ttm_bo_driver.h Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 12:43   ` kernel test robot
2022-11-25 12:43     ` [Intel-gfx] " kernel test robot
2022-11-25 21:19   ` kernel test robot
2022-11-25 21:19     ` [Intel-gfx] " kernel test robot
2022-11-25 10:21 ` [PATCH 5/9] drm/nouveau: stop using ttm_bo_wait Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2023-01-11  9:52   ` [Nouveau] " Christian König
2023-01-18  9:04     ` Christian König
2023-01-18  9:42       ` Christian König
2023-01-18 13:01       ` Karol Herbst
2023-01-18 14:15         ` Christian König
2023-01-18 15:44           ` Danilo Krummrich
2022-11-25 10:21 ` [PATCH 6/9] drm/qxl: " Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-12-15 14:19   ` Christian König
2022-12-15 14:19     ` [Intel-gfx] " Christian König
2022-12-15 20:09     ` Dave Airlie
2022-12-15 20:09       ` Dave Airlie
2022-12-15 20:09       ` [Intel-gfx] " Dave Airlie
2022-11-25 10:21 ` [PATCH 7/9] drm/i915: " Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 11:14   ` Tvrtko Ursulin
2022-11-25 12:46     ` Christian König
2022-11-29 18:05     ` Matthew Auld
2022-11-30 13:02       ` Tvrtko Ursulin
2022-11-30 14:06         ` Daniel Vetter
2022-11-30 14:06           ` Daniel Vetter
2022-12-05 19:58           ` Christian König
2022-12-05 19:58             ` Christian König
2022-12-06 18:03             ` Matthew Auld [this message]
2022-12-06 18:03               ` Matthew Auld
2022-12-06 18:06               ` Christian König
2022-12-06 18:06                 ` Christian König
2022-11-25 10:21 ` [PATCH 8/9] drm/ttm: use ttm_bo_wait_ctx instead of ttm_bo_wait Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 10:21 ` [PATCH 9/9] drm/ttm: move ttm_bo_wait into VMWGFX Christian König
2022-11-25 10:21   ` [Intel-gfx] " Christian König
2022-11-25 11:17 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/9] drm/amdgpu: generally allow over-commit during BO allocation Patchwork
2022-11-25 11:18 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2022-11-25 11:18 ` [Intel-gfx] ✗ Fi.CI.DOCS: " Patchwork
2022-11-25 11:40 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2022-11-25 18:18 ` [PATCH 1/9] " Alex Deucher
2022-11-25 18:18   ` [Intel-gfx] " Alex Deucher
2022-12-05 13:41   ` Christian König
2022-12-05 13:41     ` [Intel-gfx] " Christian König
2022-11-28  6:00 ` Arunpravin Paneer Selvam
2022-11-28  6:00   ` [Intel-gfx] " Arunpravin Paneer Selvam
2022-12-10  6:15 ` Felix Kuehling
2022-12-10  6:15   ` [Intel-gfx] " Felix Kuehling
2022-12-10 14:12   ` Christian König
2022-12-10 14:12     ` [Intel-gfx] " Christian König
2022-12-11  1:13     ` Felix Kuehling
2022-12-11  1:13       ` [Intel-gfx] " Felix Kuehling

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d56a0149-2913-8b78-de91-f633ae664a7a@intel.com \
    --to=matthew.auld@intel.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=ckoenig.leichtzumerken@gmail.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.william.auld@gmail.com \
    --cc=tvrtko.ursulin@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.