From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jason Ekstrand Subject: Re: [PATCH 3/4] drm/i915: Flush all user surfaces prior to first use Date: Fri, 19 Jul 2019 17:55:03 -0500 Message-ID: References: <20190718145407.21352-1-chris@chris-wilson.co.uk> <20190718145407.21352-3-chris@chris-wilson.co.uk> <29f36349-0c1a-3af6-d707-632685f80929@intel.com> <156353167467.24728.15340645557688634881@skylake-alporthouse-com> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="===============0530147578==" Return-path: Received: from mail-ed1-x542.google.com (mail-ed1-x542.google.com [IPv6:2a00:1450:4864:20::542]) by gabe.freedesktop.org (Postfix) with ESMTPS id 2B4AC6E876 for ; Fri, 19 Jul 2019 22:55:17 +0000 (UTC) Received: by mail-ed1-x542.google.com with SMTP id k21so35768682edq.3 for ; Fri, 19 Jul 2019 15:55:17 -0700 (PDT) In-Reply-To: <156353167467.24728.15340645557688634881@skylake-alporthouse-com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: intel-gfx-bounces@lists.freedesktop.org Sender: "Intel-gfx" To: Chris Wilson Cc: Intel GFX List-Id: intel-gfx@lists.freedesktop.org --===============0530147578== Content-Type: multipart/alternative; boundary="000000000000a34b16058e109e9a" --000000000000a34b16058e109e9a Content-Type: text/plain; charset="UTF-8" Just to be clear, is this just adding a CLFLUSH or is it actually changing the default caching state of buffers from CACHED to NONE? If it's actually changing the default state, that's going to break userspace badly. --Jason On Fri, Jul 19, 2019 at 5:21 AM Chris Wilson wrote: > Quoting Lionel Landwerlin (2019-07-19 11:18:42) > > On 18/07/2019 17:54, Chris Wilson wrote: > > > Since userspace has the ability to bypass the CPU cache from within its > > > unprivileged command stream, we have to flush the CPU cache to memory > > > in order to overwrite the previous contents on creation. > > > > > > Signed-off-by: Chris Wilson > > > Cc: Joonas Lahtinen > > > Cc: stablevger.kernel.org > > > --- > > > drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 26 > ++++++----------------- > > > 1 file changed, 7 insertions(+), 19 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > > > index d2a1158868e7..f752b326d399 100644 > > > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > > > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > > > @@ -459,7 +459,6 @@ i915_gem_object_create_shmem(struct > drm_i915_private *i915, u64 size) > > > { > > > struct drm_i915_gem_object *obj; > > > struct address_space *mapping; > > > - unsigned int cache_level; > > > gfp_t mask; > > > int ret; > > > > > > @@ -498,24 +497,13 @@ i915_gem_object_create_shmem(struct > drm_i915_private *i915, u64 size) > > > obj->write_domain = I915_GEM_DOMAIN_CPU; > > > obj->read_domains = I915_GEM_DOMAIN_CPU; > > > > > > - if (HAS_LLC(i915)) > > > - /* On some devices, we can have the GPU use the LLC (the > CPU > > > - * cache) for about a 10% performance improvement > > > - * compared to uncached. Graphics requests other than > > > - * display scanout are coherent with the CPU in > > > - * accessing this cache. This means in this mode we > > > - * don't need to clflush on the CPU side, and on the > > > - * GPU side we only need to flush internal caches to > > > - * get data visible to the CPU. > > > - * > > > - * However, we maintain the display planes as UC, and so > > > - * need to rebind when first used as such. > > > - */ > > > - cache_level = I915_CACHE_LLC; > > > - else > > > - cache_level = I915_CACHE_NONE; > > > - > > > - i915_gem_object_set_cache_coherency(obj, cache_level); > > > + /* > > > + * Note that userspace has control over cache-bypass > > > + * via its command stream, so even on LLC architectures > > > + * we have to flush out the CPU cache to memory to > > > + * clear previous contents. > > > + */ > > > + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); > > > > > > trace_i915_gem_object_create(obj); > > > > > > > Does i915_drm.h needs updating? : > > > > > > /** > > * I915_CACHING_CACHED > > * > > * GPU access is coherent with cpu caches and furthermore the data is > > cached in > > * last-level caches shared between cpu cores and the gpu GT. Default on > > * machines with HAS_LLC. > > */ > > #define I915_CACHING_CACHED 1 > > Sneaky. Thanks, > -Chris > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx --000000000000a34b16058e109e9a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Just to be clear, is this just adding a CLFLUSH or is= it actually changing the default caching state of buffers from CACHED to N= ONE?=C2=A0 If it's actually changing the default state, that's goin= g to break userspace badly.

--Jason

On Fr= i, Jul 19, 2019 at 5:21 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
Quoting Lionel Landwerlin (2019-07-19 = 11:18:42)
> On 18/07/2019 17:54, Chris Wilson wrote:
> > Since userspace has the ability to bypass the CPU cache from with= in its
> > unprivileged command stream, we have to flush the CPU cache to me= mory
> > in order to overwrite the previous contents on creation.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> > Cc: stablevger.kernel.org
> > ---
> >=C2=A0 =C2=A0drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 26 ++++++= -----------------
> >=C2=A0 =C2=A01 file changed, 7 insertions(+), 19 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/= gpu/drm/i915/gem/i915_gem_shmem.c
> > index d2a1158868e7..f752b326d399 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c
> > @@ -459,7 +459,6 @@ i915_gem_object_create_shmem(struct drm_i915_= private *i915, u64 size)
> >=C2=A0 =C2=A0{
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0struct drm_i915_gem_object *obj;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0struct address_space *mapping;
> > -=C2=A0 =C2=A0 =C2=A0unsigned int cache_level;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0gfp_t mask;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0int ret;
> >=C2=A0 =C2=A0
> > @@ -498,24 +497,13 @@ i915_gem_object_create_shmem(struct drm_i91= 5_private *i915, u64 size)
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0obj->write_domain =3D I915_GEM_DOMAI= N_CPU;
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0obj->read_domains =3D I915_GEM_DOMAI= N_CPU;
> >=C2=A0 =C2=A0
> > -=C2=A0 =C2=A0 =C2=A0if (HAS_LLC(i915))
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0/* On some devic= es, we can have the GPU use the LLC (the CPU
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * cache) for ab= out a 10% performance improvement
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * compared to u= ncached.=C2=A0 Graphics requests other than
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * display scano= ut are coherent with the CPU in
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * accessing thi= s cache.=C2=A0 This means in this mode we
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * don't nee= d to clflush on the CPU side, and on the
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * GPU side we o= nly need to flush internal caches to
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * get data visi= ble to the CPU.
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 *
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * However, we m= aintain the display planes as UC, and so
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 * need to rebin= d when first used as such.
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 */
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cache_level =3D = I915_CACHE_LLC;
> > -=C2=A0 =C2=A0 =C2=A0else
> > -=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cache_level =3D = I915_CACHE_NONE;
> > -
> > -=C2=A0 =C2=A0 =C2=A0i915_gem_object_set_cache_coherency(obj, cac= he_level);
> > +=C2=A0 =C2=A0 =C2=A0/*
> > +=C2=A0 =C2=A0 =C2=A0 * Note that userspace has control over cach= e-bypass
> > +=C2=A0 =C2=A0 =C2=A0 * via its command stream, so even on LLC ar= chitectures
> > +=C2=A0 =C2=A0 =C2=A0 * we have to flush out the CPU cache to mem= ory to
> > +=C2=A0 =C2=A0 =C2=A0 * clear previous contents.
> > +=C2=A0 =C2=A0 =C2=A0 */
> > +=C2=A0 =C2=A0 =C2=A0i915_gem_object_set_cache_coherency(obj, I91= 5_CACHE_NONE);
> >=C2=A0 =C2=A0
> >=C2=A0 =C2=A0 =C2=A0 =C2=A0trace_i915_gem_object_create(obj);
> >=C2=A0 =C2=A0
>
> Does i915_drm.h needs updating? :
>
>
> /**
>=C2=A0 =C2=A0* I915_CACHING_CACHED
>=C2=A0 =C2=A0*
>=C2=A0 =C2=A0* GPU access is coherent with cpu caches and furthermore t= he data is
> cached in
>=C2=A0 =C2=A0* last-level caches shared between cpu cores and the gpu G= T. Default on
>=C2=A0 =C2=A0* machines with HAS_LLC.
>=C2=A0 =C2=A0*/
> #define I915_CACHING_CACHED=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 1

Sneaky. Thanks,
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-= gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listin= fo/intel-gfx
--000000000000a34b16058e109e9a-- --===============0530147578== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: base64 Content-Disposition: inline X19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX19fX18KSW50ZWwtZ2Z4 IG1haWxpbmcgbGlzdApJbnRlbC1nZnhAbGlzdHMuZnJlZWRlc2t0b3Aub3JnCmh0dHBzOi8vbGlz dHMuZnJlZWRlc2t0b3Aub3JnL21haWxtYW4vbGlzdGluZm8vaW50ZWwtZ2Z4 --===============0530147578==--