All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Koenig, Christian" <Christian.Koenig@amd.com>
To: Daniel Vetter <daniel@ffwll.ch>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	DRI Development <dri-devel@lists.freedesktop.org>
Subject: Re: ✗ Fi.CI.BAT: failure for series starting with [1/6] dma-buf: add dynamic DMA-buf handling v13
Date: Thu, 8 Aug 2019 11:05:39 +0000	[thread overview]
Message-ID: <890117cc-21e6-dec1-7e42-ad0fea2a731f@amd.com> (raw)
In-Reply-To: <CAKMK7uG=7AqvJiS3Ooo2rE2WvRQpaRiRuPFdVgj=8A=7e1=VQw@mail.gmail.com>

Am 08.08.19 um 09:29 schrieb Daniel Vetter:
> On Thu, Aug 8, 2019 at 9:09 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
>> Am 07.08.19 um 23:19 schrieb Daniel Vetter:
>>> On Wed, Jul 31, 2019 at 10:55:02AM +0200, Daniel Vetter wrote:
>>>> On Thu, Jun 27, 2019 at 09:28:11AM +0200, Christian König wrote:
>>>>> Hi Daniel,
>>>>>
>>>>> those fails look like something random to me and not related to my patch
>>>>> set. Correct?
>>>> First one I looked at has the reservation_obj all over:
>>>>
>>>> https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_13438/fi-cml-u/igt@gem_exec_fence@basic-busy-default.html
>>>>
>>>> So 5 second guees is ... probably real?
>>>>
>>>> Note that with the entire lmem stuff going on right now there's massive
>>>> discussions about how we're doing resv_obj vs obj->mm.lock the wrong way
>>>> round in i915, so I'm not surprised at all that you managed to trip this.
>>>>
>>>> The way I see it right now is that obj->mm.lock needs to be limited to
>>>> dealing with the i915 shrinker interactions only, and only for i915 native
>>>> objects. And for dma-bufs we need to make sure it's not anywhere in the
>>>> callchain. Unfortunately that's a massive refactor I guess ...
>>> Thought about this some more, aside from just breaking i915 or waiting
>>> until it's refactored (Both not awesome) I think the only option is get
>>> back to the original caching. And figure out whether we really need to
>>> take the direction into account for that, or whether upgrading to
>>> bidirectional unconditionally won't be ok. I think there's only really two
>>> cases where this matters:
>>>
>>> - display drivers using the cma/dma_alloc helpers. Everything is allocated
>>>     fully coherent, cpu side wc, no flushing.
>>>
>>> - Everyone else (on platforms where there's actually some flushing going
>>>     on) is for rendering gpus, and those always map bidirectional and want
>>>     the mapping cached for as long as possible.
>>>
>>> With that we could go back to creating the cached mapping at attach time
>>> and avoid inflicting the reservation object lock to places that would keel
>>> over.
>>>
>>> Thoughts?
>> Actually we had a not so nice internal mail thread with our hardware
>> guys and it looks like we have tons of hardware bugs/exceptions that
>> sometimes PCIe BARs are only readable or only writable. So it turned out
>> that always caching with bidirectional won't work for us either.
>>
>> Additional to that I'm not sure how i915 actually triggered the issue,
>> cause with the current code that shouldn't be possible.
>>
>> But independent of that I came to the conclusion that we first need to
>> get to a common view of what the fences in the reservation mean or
>> otherwise the whole stuff here isn't going to work smooth either.
>>
>> So working on that for now and when that's finished I will come back to
>> this problem here again.
> Yeah makes sense. I think we also need to clarify a bit the existing
> rules around reservatrion_object, dma_fence signaling, and how that
> nests with everything else (like memory allocation/fs_reclaim critical
> sections, or mmap_sem).
>
> Ignore the drivers which just pin everything system memory (mostly
> just socs) I think we have a bunch of groups, and they're all somewhat
> incompatible with each another. Examples:
>
> - old ttm drivers (anything except amdgpu) nest the mmap_sem within
> the reservation_object. That allows you to do copy_*_user while
> holding reservations, simplifying command submission since you don't
> need fallback paths when you take a fault. But means you have this
> awkward trylock in the mmap path with no forward progress guarantee at
> all.
>
> amdgpu fixed that (but left ttm alone), i915 also works like that with
> mmap_sem being the outer lock.

By the way that is incorrect. Both amdgpu as well as readeon don't use 
copy_to/from_user while holding the reservation lock.

The last time I checked the only driver still doing that was nouveau.

Maybe time to add a might_lock() so that we will be informed about 
misuse by lockdep?

Christian.

>
> - other is reservation_object vs memory allocations. Currently all
> drivers assume you can allocate memory while holding a reservation,
> but i915 gem folks seem to have some plans to change that for i915.
> Which isn't going to work I think, so we need to clarify that before
> things get more inconsistent.
>
> Above two can at least be ensured by adding somme lockdep annotations
> and dependency priming, see i915_gem_shrinker_taints_mutex for what I
> have in mind for reservation_obj.
>
> The real pain/scary thing is dma_fence. All the
> shrinkers/mmu_notifiers/hmm_mirrors we have assume that you can wait
> for a fence from allocation contexts/direct reclaim. Which means
> nothing you do between publishing a fence somewhere (dma-buf, syncobj,
> syncpt fd) and signalling that fence is allowed to allocate memory or
> pull in any dependencies which might need memory allocations. I think
> we're mostly ok with this, but there's some i915 patches that break
> this.
>
> Much worse is that lockdep can't help us check this: dma_fence is
> essentially struct completion on steroids, and the cross-release
> lockdep support for struct completion looks like it's never going to
> get merged. So no debugging aids to make sure we get this right, all
> we have is review and testing and machines deadlocking in really
> complicated ways if we get it wrong.
>
> Cheers, Daniel

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

  reply	other threads:[~2019-08-08 11:05 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-26 12:23 [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13 Christian König
     [not found] ` <20190626122310.1498-1-christian.koenig-5C7GfCeVMHo@public.gmane.org>
2019-06-26 12:23   ` [PATCH 2/6] drm/ttm: remove the backing store if no placement is given Christian König
2019-06-26 12:23   ` [PATCH 3/6] drm/ttm: use the parent resv for ghost objects v2 Christian König
2019-06-26 12:23   ` [PATCH 4/6] drm/amdgpu: use allowed_domains for exported DMA-bufs Christian König
2019-06-26 12:23 ` [PATCH 5/6] drm/amdgpu: add independent DMA-buf export v6 Christian König
2019-06-26 12:23 ` [PATCH 6/6] drm/amdgpu: add independent DMA-buf import v7 Christian König
2019-06-26 12:29 ` [PATCH 1/6] dma-buf: add dynamic DMA-buf handling v13 Daniel Vetter
     [not found]   ` <20190626122933.GK12905-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-07-16 14:21     ` Christian König
     [not found]       ` <ef70981d-3d52-b339-b3f5-190635969621-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2019-07-19  8:57         ` Daniel Vetter
2019-07-19  9:14           ` Koenig, Christian
     [not found]             ` <4704d3c5-894d-6ac1-4afb-06c275700bac-5C7GfCeVMHo@public.gmane.org>
2019-07-19  9:31               ` Daniel Vetter
2019-07-19 12:05                 ` Koenig, Christian
     [not found]                   ` <ff12f6e0-f34b-5ea0-72d5-851ef6bb141f-5C7GfCeVMHo@public.gmane.org>
2019-07-19 13:30                     ` Daniel Vetter
2019-06-26 12:40 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/6] " Patchwork
2019-06-26 12:41 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-06-26 13:32 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-06-27  7:28   ` Christian König
2019-07-31  8:55     ` Daniel Vetter
2019-08-07 21:19       ` Daniel Vetter
2019-08-08  7:09         ` Koenig, Christian
2019-08-08  7:29           ` Daniel Vetter
2019-08-08 11:05             ` Koenig, Christian [this message]
2019-08-08 11:40               ` Daniel Vetter
2019-08-08  7:59           ` Daniel Vetter
2019-07-31  9:12 ` [PATCH 1/6] " Daniel Vetter
     [not found]   ` <20190731091231.GI7444-dv86pmgwkMBes7Z6vYuT8azUEOm+Xw19@public.gmane.org>
2019-07-31  9:44     ` Christian König
2019-07-31 10:39       ` Daniel Vetter
     [not found]         ` <CAKMK7uGp64yzpuW_QOJds_ZD=4+z9ymZtpwHT+u2zhD95z4Xnw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2019-07-31 11:19           ` Christian König
2019-07-31 12:41             ` Daniel Vetter

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=890117cc-21e6-dec1-7e42-ad0fea2a731f@amd.com \
    --to=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.