From: "Christian König" <christian.koenig@amd.com>
To: Daniel Vetter <daniel.vetter@ffwll.ch>,
DRI Development <dri-devel@lists.freedesktop.org>
Cc: "Rob Clark" <robdclark@chromium.org>,
"Daniel Stone" <daniels@collabora.com>,
"Daniel Vetter" <daniel.vetter@intel.com>,
"Intel Graphics Development" <intel-gfx@lists.freedesktop.org>,
"Kevin Wang" <kevin1.wang@amd.com>,
linaro-mm-sig@lists.linaro.org,
"Luben Tuikov" <luben.tuikov@amd.com>,
"Kristian H . Kristensen" <hoegsberg@google.com>,
"Chen Li" <chenli@uniontech.com>,
"Alex Deucher" <alexander.deucher@amd.com>,
mesa-dev@lists.freedesktop.org,
"Michel Dänzer" <michel@daenzer.net>,
"Dennis Li" <Dennis.Li@amd.com>,
"Deepak R Varma" <mh12gx2825@gmail.com>
Subject: Re: [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules
Date: Fri, 21 May 2021 13:22:33 +0200 [thread overview]
Message-ID: <70ca7b86-c5ac-79ad-89dd-03108e9936ed@amd.com> (raw)
In-Reply-To: <20210521090959.1663703-1-daniel.vetter@ffwll.ch>
Am 21.05.21 um 11:09 schrieb Daniel Vetter:
> Docs for struct dma_resv are fairly clear:
>
> "A reservation object can have attached one exclusive fence (normally
> associated with write operations) or N shared fences (read
> operations)."
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdri.freedesktop.org%2Fdocs%2Fdrm%2Fdriver-api%2Fdma-buf.html%23reservation-objects&data=04%7C01%7Cchristian.koenig%40amd.com%7C2cdb7d8e82de40fd452e08d91c383a13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637571850083203679%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=Y5zO4aMKMuQwTCKVk6DrjTIbbRBLrcklrZgNCzNGXGs%3D&reserved=0
>
> Furthermore a review across all of upstream.
>
> First of render drivers and how they set implicit fences:
>
> - nouveau follows this contract, see in validate_fini_no_ticket()
>
> nouveau_bo_fence(nvbo, fence, !!b->write_domains);
>
> and that last boolean controls whether the exclusive or shared fence
> slot is used.
>
> - radeon follows this contract by setting
>
> p->relocs[i].tv.num_shared = !r->write_domain;
>
> in radeon_cs_parser_relocs(), which ensures that the call to
> ttm_eu_fence_buffer_objects() in radeon_cs_parser_fini() will do the
> right thing.
>
> - vmwgfx seems to follow this contract with the shotgun approach of
> always setting ttm_val_buf->num_shared = 0, which means
> ttm_eu_fence_buffer_objects() will only use the exclusive slot.
>
> - etnaviv follows this contract, as can be trivially seen by looking
> at submit_attach_object_fences()
>
> - i915 is a bit a convoluted maze with multiple paths leading to
> i915_vma_move_to_active(). Which sets the exclusive flag if
> EXEC_OBJECT_WRITE is set. This can either come as a buffer flag for
> softpin mode, or through the write_domain when using relocations. It
> follows this contract.
>
> - lima follows this contract, see lima_gem_submit() which sets the
> exclusive fence when the LIMA_SUBMIT_BO_WRITE flag is set for that
> bo
>
> - msm follows this contract, see msm_gpu_submit() which sets the
> exclusive flag when the MSM_SUBMIT_BO_WRITE is set for that buffer
>
> - panfrost follows this contract with the shotgun approach of just
> always setting the exclusive fence, see
> panfrost_attach_object_fences(). Benefits of a single engine I guess
>
> - v3d follows this contract with the same shotgun approach in
> v3d_attach_fences_and_unlock_reservation(), but it has at least an
> XXX comment that maybe this should be improved
>
> - v4c uses the same shotgun approach of always setting an exclusive
> fence, see vc4_update_bo_seqnos()
>
> - vgem also follows this contract, see vgem_fence_attach_ioctl() and
> the VGEM_FENCE_WRITE. This is used in some igts to validate prime
> sharing with i915.ko without the need of a 2nd gpu
>
> - vritio follows this contract again with the shotgun approach of
> always setting an exclusive fence, see virtio_gpu_array_add_fence()
>
> This covers the setting of the exclusive fences when writing.
>
> Synchronizing against the exclusive fence is a lot more tricky, and I
> only spot checked a few:
>
> - i915 does it, with the optional EXEC_OBJECT_ASYNC to skip all
> implicit dependencies (which is used by vulkan)
>
> - etnaviv does this. Implicit dependencies are collected in
> submit_fence_sync(), again with an opt-out flag
> ETNA_SUBMIT_NO_IMPLICIT. These are then picked up in
> etnaviv_sched_dependency which is the
> drm_sched_backend_ops->dependency callback.
>
> - v4c seems to not do much here, maybe gets away with it by not having
> a scheduler and only a single engine. Since all newer broadcom chips than
> the OG vc4 use v3d for rendering, which follows this contract, the
> impact of this issue is fairly small.
>
> - v3d does this using the drm_gem_fence_array_add_implicit() helper,
> which then it's drm_sched_backend_ops->dependency callback
> v3d_job_dependency() picks up.
>
> - panfrost is nice here and tracks the implicit fences in
> panfrost_job->implicit_fences, which again the
> drm_sched_backend_ops->dependency callback panfrost_job_dependency()
> picks up. It is mildly questionable though since it only picks up
> exclusive fences in panfrost_acquire_object_fences(), but not buggy
> in practice because it also always sets the exclusive fence. It
> should pick up both sets of fences, just in case there's ever going
> to be a 2nd gpu in a SoC with a mali gpu. Or maybe a mali SoC with a
> pcie port and a real gpu, which might actually happen eventually. A
> bug, but easy to fix. Should probably use the
> drm_gem_fence_array_add_implicit() helper.
>
> - lima is nice an easy, uses drm_gem_fence_array_add_implicit() and
> the same schema as v3d.
>
> - msm is mildly entertaining. It also supports MSM_SUBMIT_NO_IMPLICIT,
> but because it doesn't use the drm/scheduler it handles fences from
> the wrong context with a synchronous dma_fence_wait. See
> submit_fence_sync() leading to msm_gem_sync_object(). Investing into
> a scheduler might be a good idea.
>
> - all the remaining drivers are ttm based, where I hope they do
> appropriately obey implicit fences already. I didn't do the full
> audit there because a) not follow the contract would confuse ttm
> quite well and b) reading non-standard scheduler and submit code
> which isn't based on drm/scheduler is a pain.
>
> Onwards to the display side.
>
> - Any driver using the drm_gem_plane_helper_prepare_fb() helper will
> correctly. Overwhelmingly most drivers get this right, except a few
> totally dont. I'll follow up with a patch to make this the default
> and avoid a bunch of bugs.
>
> - I didn't audit the ttm drivers, but given that dma_resv started
> there I hope they get this right.
>
> In conclusion this IS the contract, both as documented and
> overwhelmingly implemented, specically as implemented by all render
> drivers except amdgpu.
>
> Amdgpu tried to fix this already in
>
> commit 049aca4363d8af87cab8d53de5401602db3b9999
> Author: Christian König <christian.koenig@amd.com>
> Date: Wed Sep 19 16:54:35 2018 +0200
>
> drm/amdgpu: fix using shared fence for exported BOs v2
>
> but this fix falls short on a number of areas:
>
> - It's racy, by the time the buffer is shared it might be too late. To
> make sure there's definitely never a problem we need to set the
> fences correctly for any buffer that's potentially exportable.
>
> - It's breaking uapi, dma-buf fds support poll() and differentitiate
> between, which was introduced in
>
> commit 9b495a5887994a6d74d5c261d012083a92b94738
> Author: Maarten Lankhorst <maarten.lankhorst@canonical.com>
> Date: Tue Jul 1 12:57:43 2014 +0200
>
> dma-buf: add poll support, v3
>
> - Christian König wants to nack new uapi building further on this
> dma_resv contract because it breaks amdgpu, quoting
>
> "Yeah, and that is exactly the reason why I will NAK this uAPI change.
>
> "This doesn't works for amdgpu at all for the reasons outlined above."
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Fdri-devel%2Ff2eb6751-2f82-9b23-f57e-548de5b729de%40gmail.com%2F&data=04%7C01%7Cchristian.koenig%40amd.com%7C2cdb7d8e82de40fd452e08d91c383a13%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637571850083203679%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=WkQz%2Bdd61XuEw93JOcKx17SQFpNcyMDvvSBgRA9N0U4%3D&reserved=0
>
> Rejecting new development because your own driver is broken and
> violates established cross driver contracts and uapi is really not
> how upstream works.
>
> Now this patch will have a severe performance impact on anything that
> runs on multiple engines. So we can't just merge it outright, but need
> a bit a plan:
>
> - amdgpu needs a proper uapi for handling implicit fencing. The funny
> thing is that to do it correctly, implicit fencing must be treated
> as a very strange IPC mechanism for transporting fences, where both
> setting the fence and dependency intercepts must be handled
> explicitly. Current best practices is a per-bo flag to indicate
> writes, and a per-bo flag to to skip implicit fencing in the CS
> ioctl as a new chunk.
>
> - Since amdgpu has been shipping with broken behaviour we need an
> opt-out flag from the butchered implicit fencing model to enable the
> proper explicit implicit fencing model.
>
> - for kernel memory fences due to bo moves at least the i915 idea is
> to use ttm_bo->moving. amdgpu probably needs the same.
>
> - since the current p2p dma-buf interface assumes the kernel memory
> fence is in the exclusive dma_resv fence slot we need to add a new
> fence slot for kernel fences, which must never be ignored. Since
> currently only amdgpu supports this there's no real problem here
> yet, until amdgpu gains a NO_IMPLICIT CS flag.
>
> - New userspace needs to ship in enough desktop distros so that users
> wont notice the perf impact. I think we can ignore LTS distros who
> upgrade their kernels but not their mesa3d snapshot.
>
> - Then when this is all in place we can merge this patch here.
>
> What is not a solution to this problem here is trying to make the
> dma_resv rules in the kernel more clever. The fundamental issue here
> is that the amdgpu CS uapi is the least expressive one across all
> drivers (only equalled by panfrost, which has an actual excuse) by not
> allowing any userspace control over how implicit sync is conducted.
>
> Until this is fixed it's completely pointless to make the kernel more
> clever to improve amdgpu, because all we're doing is papering over
> this uapi design issue. amdgpu needs to attain the status quo
> established by other drivers first, once that's achieved we can tackle
> the remaining issues in a consistent way across drivers.
>
> Cc: mesa-dev@lists.freedesktop.org
> Cc: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> Cc: Dave Airlie <airlied@gmail.com>
> Cc: Rob Clark <robdclark@chromium.org>
> Cc: Kristian H. Kristensen <hoegsberg@google.com>
> Cc: Michel Dänzer <michel@daenzer.net>
> Cc: Daniel Stone <daniels@collabora.com>
> Cc: Sumit Semwal <sumit.semwal@linaro.org>
> Cc: "Christian König" <christian.koenig@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Deepak R Varma <mh12gx2825@gmail.com>
> Cc: Chen Li <chenli@uniontech.com>
> Cc: Kevin Wang <kevin1.wang@amd.com>
> Cc: Dennis Li <Dennis.Li@amd.com>
> Cc: Luben Tuikov <luben.tuikov@amd.com>
> Cc: linaro-mm-sig@lists.linaro.org
> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com>
And as explained before this is a general NAK.
I'm not discussing this further until we have fixed the dma_resv rules
for implicit synchronization since this will just result in every
command submission serializing all accesses to BOs which is certainly
not what we want.
Regards,
Christian.
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> index 88a24a0b5691..cc8426e1e8a8 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> @@ -617,8 +617,8 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
> amdgpu_bo_list_for_each_entry(e, p->bo_list) {
> struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
>
> - /* Make sure we use the exclusive slot for shared BOs */
> - if (bo->prime_shared_count)
> + /* Make sure we use the exclusive slot for all potentially shared BOs */
> + if (!(bo->flags & AMDGPU_GEM_CREATE_VM_ALWAYS_VALID))
> e->tv.num_shared = 0;
> e->bo_va = amdgpu_vm_bo_find(vm, bo);
> }
next prev parent reply other threads:[~2021-05-21 11:22 UTC|newest]
Thread overview: 53+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-05-21 9:09 [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules Daniel Vetter
2021-05-21 9:09 ` [PATCH 02/11] drm/panfrost: Remove sched_lock Daniel Vetter
2021-05-21 9:32 ` Lucas Stach
2021-05-21 14:49 ` Daniel Vetter
2021-05-21 9:09 ` [PATCH 03/11] drm/panfrost: Use xarray and helpers for depedency tracking Daniel Vetter
2021-06-02 14:06 ` Steven Price
2021-06-02 18:51 ` Daniel Vetter
2021-06-03 7:48 ` Steven Price
2021-05-21 9:09 ` [PATCH 04/11] drm/panfrost: Fix implicit sync Daniel Vetter
2021-05-21 12:22 ` Daniel Stone
2021-05-21 12:28 ` [Linaro-mm-sig] " Christian König
2021-05-21 12:54 ` Daniel Stone
2021-05-21 13:09 ` Christian König
2021-05-21 13:23 ` Daniel Stone
2021-05-21 9:09 ` [PATCH 05/11] drm/atomic-helper: make drm_gem_plane_helper_prepare_fb the default Daniel Vetter
2021-05-21 9:09 ` [PATCH 06/11] drm/<driver>: drm_gem_plane_helper_prepare_fb is now " Daniel Vetter
2021-05-21 9:38 ` Lucas Stach
2021-05-21 12:20 ` Heiko Stübner
2021-05-21 12:22 ` Paul Cercueil
2021-05-21 15:53 ` Jernej Škrabec
2021-05-21 23:18 ` Chun-Kuang Hu
2021-05-23 12:17 ` Martin Blumenstingl
2021-05-24 7:54 ` Tomi Valkeinen
2021-05-28 9:55 ` Philippe CORNU
2021-05-21 9:09 ` [PATCH 07/11] drm/armada: Remove prepare/cleanup_fb hooks Daniel Vetter
2021-05-21 9:09 ` [PATCH 08/11] drm/vram-helpers: Create DRM_GEM_VRAM_PLANE_HELPER_FUNCS Daniel Vetter
2021-05-21 9:33 ` tiantao (H)
2021-05-21 9:09 ` [PATCH 09/11] drm/omap: Follow implicit fencing in prepare_fb Daniel Vetter
2021-05-24 7:53 ` Tomi Valkeinen
2021-05-21 9:09 ` [PATCH 10/11] drm/simple-helper: drm_gem_simple_display_pipe_prepare_fb as default Daniel Vetter
2021-05-25 17:48 ` Noralf Trønnes
2021-05-25 17:53 ` Daniel Vetter
2021-05-21 9:09 ` [PATCH 11/11] drm/tiny: drm_gem_simple_display_pipe_prepare_fb is the default Daniel Vetter
2021-05-21 13:41 ` David Lechner
2021-05-21 14:09 ` Noralf Trønnes
2021-05-25 16:05 ` Daniel Vetter
2021-05-21 14:13 ` Oleksandr Andrushchenko
2021-05-28 0:38 ` Linus Walleij
2021-05-21 9:46 ` [PATCH 01/11] drm/amdgpu: Comply with implicit fencing rules Bas Nieuwenhuizen
2021-05-21 14:37 ` Daniel Vetter
2021-05-21 15:00 ` Bas Nieuwenhuizen
2021-05-21 15:16 ` Daniel Vetter
2021-05-21 18:08 ` [Mesa-dev] " Christian König
2021-05-21 18:31 ` Daniel Vetter
2021-05-22 8:30 ` Christian König
2021-05-25 13:05 ` Daniel Vetter
2021-05-25 15:05 ` Christian König
2021-05-25 15:23 ` Daniel Vetter
2021-05-26 13:32 ` Christian König
2021-05-26 13:51 ` Daniel Vetter
2021-05-21 11:22 ` Christian König [this message]
2021-05-21 14:58 ` Rob Clark
2021-05-21 14:58 ` Daniel Vetter
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=70ca7b86-c5ac-79ad-89dd-03108e9936ed@amd.com \
--to=christian.koenig@amd.com \
--cc=Dennis.Li@amd.com \
--cc=alexander.deucher@amd.com \
--cc=chenli@uniontech.com \
--cc=daniel.vetter@ffwll.ch \
--cc=daniel.vetter@intel.com \
--cc=daniels@collabora.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hoegsberg@google.com \
--cc=intel-gfx@lists.freedesktop.org \
--cc=kevin1.wang@amd.com \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=luben.tuikov@amd.com \
--cc=mesa-dev@lists.freedesktop.org \
--cc=mh12gx2825@gmail.com \
--cc=michel@daenzer.net \
--cc=robdclark@chromium.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).