Re: [RESEND PATCH v2 4/5] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance

From: Rob Clark <robdclark@gmail.com>
To: Jonathan Marek <jonathan@marek.ca>
Cc: Christoph Hellwig <hch@lst.de>,
	freedreno <freedreno@lists.freedesktop.org>,
	Sean Paul <sean@poorly.run>, David Airlie <airlied@linux.ie>,
	Daniel Vetter <daniel@ffwll.ch>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU" 
	<linux-arm-msm@vger.kernel.org>,
	"open list:DRM DRIVER FOR MSM ADRENO GPU" 
	<dri-devel@lists.freedesktop.org>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [RESEND PATCH v2 4/5] drm/msm: add DRM_MSM_GEM_SYNC_CACHE for non-coherent cache maintenance
Date: Sat, 14 Nov 2020 12:48:15 -0800	[thread overview]
Message-ID: <CAF6AEGv7fXGVVWcTcSXF6EGB2LOi_wvQP6h6hcX8yNvAZRDbVg@mail.gmail.com> (raw)
In-Reply-To: <b6e4f167-871a-5f26-46bd-d914476af519@marek.ca>

On Sat, Nov 14, 2020 at 12:10 PM Jonathan Marek <jonathan@marek.ca> wrote:
>
> On 11/14/20 2:39 PM, Rob Clark wrote:
> > On Sat, Nov 14, 2020 at 10:58 AM Jonathan Marek <jonathan@marek.ca> wrote:
> >>
> >> On 11/14/20 1:46 PM, Rob Clark wrote:
> >>> On Sat, Nov 14, 2020 at 8:24 AM Christoph Hellwig <hch@lst.de> wrote:
> >>>>
> >>>> On Sat, Nov 14, 2020 at 10:17:12AM -0500, Jonathan Marek wrote:
> >>>>> +void msm_gem_sync_cache(struct drm_gem_object *obj, uint32_t flags,
> >>>>> +             size_t range_start, size_t range_end)
> >>>>> +{
> >>>>> +     struct msm_gem_object *msm_obj = to_msm_bo(obj);
> >>>>> +     struct device *dev = msm_obj->base.dev->dev;
> >>>>> +
> >>>>> +     /* exit early if get_pages() hasn't been called yet */
> >>>>> +     if (!msm_obj->pages)
> >>>>> +             return;
> >>>>> +
> >>>>> +     /* TODO: sync only the specified range */
> >>>>> +
> >>>>> +     if (flags & MSM_GEM_SYNC_FOR_DEVICE) {
> >>>>> +             dma_sync_sg_for_device(dev, msm_obj->sgt->sgl,
> >>>>> +                             msm_obj->sgt->nents, DMA_TO_DEVICE);
> >>>>> +     }
> >>>>> +
> >>>>> +     if (flags & MSM_GEM_SYNC_FOR_CPU) {
> >>>>> +             dma_sync_sg_for_cpu(dev, msm_obj->sgt->sgl,
> >>>>> +                             msm_obj->sgt->nents, DMA_FROM_DEVICE);
> >>>>> +     }
> >>>>
> >>>> Splitting this helper from the only caller is rather strange, epecially
> >>>> with the two unused arguments.  And I think the way this is specified
> >>>> to take a range, but ignoring it is actively dangerous.  User space will
> >>>> rely on it syncing everything sooner or later and then you are stuck.
> >>>> So just define a sync all primitive for now, and if you really need a
> >>>> range sync and have actually implemented it add a new ioctl for that.
> >>>
> >>> We do already have a split of ioctl "layer" which enforces valid ioctl
> >>> params, etc, and gem (or other) module code which is called by the
> >>> ioctl func.  So I think it is fine to keep this split here.  (Also, I
> >>> think at some point there will be a uring type of ioctl alternative
> >>> which would re-use the same gem func.)
> >>>
> >>> But I do agree that the range should be respected or added later..
> >>> drm_ioctl() dispatch is well prepared for extending ioctls.
> >>>
> >>> And I assume there should be some validation that the range is aligned
> >>> to cache-line?  Or can we flush a partial cache line?
> >>>
> >>
> >> The range is intended to be "sync at least this range", so that
> >> userspace doesn't have to worry about details like that.
> >>
> >
> > I don't think userspace can *not* worry about details like that.
> > Consider a case where the cpu and gpu are simultaneously accessing
> > different parts of a buffer (for ex, sub-allocation).  There needs to
> > be cache-line separation between the two.
> >
>
> Right.. and it also seems like we can't get away with just
> flushing/invalidating the whole thing.
>
> qcom's vulkan driver has nonCoherentAtomSize=1, and it looks like
> dma_sync_single_for_cpu() does deal in some way with the partial cache
> line case, although I'm not sure that means we can have a
> nonCoherentAtomSize=1.
>

flush/inv the whole thing could be a useful first step, or at least I
can think of some uses for it.  But if it isn't useful for how vk sees
the world, then maybe we should just implement the range properly from
the get-go.  (And I *think* requiring the range to be aligned to
cacheline boundaries.. it is always easy from a kernel uabi PoV to
loosen restrictions later, than the other way around.)

BR,
-R