Re: [RFC] drm/i915/tgl: Advanced preparser support for GPU relocs

From: Chris Wilson <chris@chris-wilson.co.uk>
To: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>,
	intel-gfx@lists.freedesktop.org
Subject: Re: [RFC] drm/i915/tgl: Advanced preparser support for GPU relocs
Date: Fri, 23 Aug 2019 16:52:40 +0100	[thread overview]
Message-ID: <156657556096.4019.5895875072663620308@skylake-alporthouse-com> (raw)
In-Reply-To: <f8bd967c-de26-730b-9871-ec918279e06b@intel.com>

Quoting Daniele Ceraolo Spurio (2019-08-23 16:39:14)
> 
> 
> On 8/23/19 8:28 AM, Chris Wilson wrote:
> > Quoting Chris Wilson (2019-08-23 16:10:48)
> >> Quoting Daniele Ceraolo Spurio (2019-08-23 16:05:45)
> >>>
> >>>
> >>> On 8/23/19 7:26 AM, Chris Wilson wrote:
> >>>> Quoting Chris Wilson (2019-08-23 08:27:25)
> >>>>> Quoting Daniele Ceraolo Spurio (2019-08-23 03:09:09)
> >>>>>> TGL has an improved CS pre-parser that can now pre-fetch commands across
> >>>>>> batch boundaries. This improves performances when lots of small batches
> >>>>>> are used, but has an impact on self-modifying code. If we want to modify
> >>>>>> the content of a batch from another ring/batch, we need to either
> >>>>>> guarantee that the memory location is updated before the pre-parser gets
> >>>>>> to it or we need to turn the pre-parser off around the modification.
> >>>>>> In i915, we use self-modifying code only for GPU relocations.
> >>>>>>
> >>>>>> The pre-parser fetches across memory synchronization commands as well,
> >>>>>> so the only way to guarantee that the writes land before the parser gets
> >>>>>> to it is to have more instructions between the sync and the destination
> >>>>>> than the parser FIFO depth, which is not an optimal solution.
> >>>>>
> >>>>> Well, our ABI is that memory is coherent before the breadcrumb of *each*
> >>>>> batch. That is a fundamental requirement for our signaling to userspace.
> >>>>> Please tell me that there is a context flag to turn this off, or we else
> >>>>> we need to emit 32x flushes or whatever it takes.
> >>>>
> >>> Are you referring to the specific case where we have a request modifying
> >>> an object that is then used as a batch in the next request? Because
> >>> coherency of objects that are not executed as batches is not impacted.
> >>
> >> "Fetches across memory sync" sounds like a major ABI break. The batches
> >> are a hard serialisation barrier, with memory coherency guaranteed prior
> >> to the signaling at the end of one batch and clear caches guaranteed at
> >> the start of the next.
> > 
> > We have relocs, oa and sseu all using self-modifying code. I expect we
> > will have PTE modifications and much more done via the GPU in the near
> > future. All rely on the CS_STALL doing exactly what it says on the tin.
> > -Chris
> > 
> 
> I guess the easiest solution is then to keep the parser off outside of 
> user batches. We can default to off and then restore what the user has 
> programmed before the BBSTART. It's not a breach of contract if we say 
> that if you opt-in to the parser then you need to make sure your batches 
> are not self-modifying, right?

Is it just the MI_ARB_ONOFF bits, and is that still a privileged
command? i.e. can userspace change mode by itself, or it is a
context-param?

> BTW the CS_STALL does not guarantee on pre-gen12 gens that 
> self-modifying code works within the same batch/ring because the 
> pre-parser is already pre-fetching across memory sync points, it just 
> stops at the next arb point.

Ok, we still uphold our contract if they can't execute any code in the
window where they would see someone else's data.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx