All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-24 10:05 [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines akash.goel
@ 2015-11-24 10:04 ` Ville Syrjälä
  2015-11-24 18:14   ` Daniel Vetter
  2015-11-24 10:10 ` [PATCH] " Chris Wilson
  1 sibling, 1 reply; 25+ messages in thread
From: Ville Syrjälä @ 2015-11-24 10:04 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> When the object is moved out of CPU read domain, the cachelines
> are not invalidated immediately. The invalidation is deferred till
> next time the object is brought back into CPU read domain.
> But the invalidation is done unconditionally, i.e. even for the case
> where the cachelines were flushed previously, when the object moved out
> of CPU write domain. This is avoidable and would lead to some optimization.
> Though this is not a hypothetical case, but is unlikely to occur often.
> The aim is to detect changes to the backing storage whilst the
> data is potentially in the CPU cache, and only clflush in those case.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h | 1 +
>  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index df9316f..fedb71d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
>  	unsigned long gt_ro:1;
>  	unsigned int cache_level:3;
>  	unsigned int cache_dirty:1;
> +	unsigned int cache_clean:1;

So now we have cache_dirty and cache_clean which seems redundant,
except somehow cache_dirty != !cache_clean?

>  
>  	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 19c282b..a13ffd4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
>  	trace_i915_gem_object_clflush(obj);
>  	drm_clflush_sg(obj->pages);
>  	obj->cache_dirty = false;
> +	obj->cache_clean = true;
>  
>  	return true;
>  }
> @@ -3982,7 +3983,13 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  
>  	/* Flush the CPU cache if it's still invalid. */
>  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> -		i915_gem_clflush_object(obj, false);
> +		/* Invalidation not needed as there should not be any data in
> +		 * CPU cache lines for this object, since clflush would have
> +		 * happened when the object last moved out of CPU write domain.
> +		 */
> +		if (!obj->cache_clean)
> +			i915_gem_clflush_object(obj, false);
> +		obj->cache_clean = false;
>  
>  		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
>  	}
> -- 
> 1.9.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
@ 2015-11-24 10:05 akash.goel
  2015-11-24 10:04 ` Ville Syrjälä
  2015-11-24 10:10 ` [PATCH] " Chris Wilson
  0 siblings, 2 replies; 25+ messages in thread
From: akash.goel @ 2015-11-24 10:05 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h | 1 +
 drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
 2 files changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index df9316f..fedb71d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
 	unsigned long gt_ro:1;
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
+	unsigned int cache_clean:1;
 
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 19c282b..a13ffd4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	trace_i915_gem_object_clflush(obj);
 	drm_clflush_sg(obj->pages);
 	obj->cache_dirty = false;
+	obj->cache_clean = true;
 
 	return true;
 }
@@ -3982,7 +3983,13 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* Invalidation not needed as there should not be any data in
+		 * CPU cache lines for this object, since clflush would have
+		 * happened when the object last moved out of CPU write domain.
+		 */
+		if (!obj->cache_clean)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_clean = false;
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-24 10:05 [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines akash.goel
  2015-11-24 10:04 ` Ville Syrjälä
@ 2015-11-24 10:10 ` Chris Wilson
  1 sibling, 0 replies; 25+ messages in thread
From: Chris Wilson @ 2015-11-24 10:10 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> When the object is moved out of CPU read domain, the cachelines
> are not invalidated immediately. The invalidation is deferred till
> next time the object is brought back into CPU read domain.
> But the invalidation is done unconditionally, i.e. even for the case
> where the cachelines were flushed previously, when the object moved out
> of CPU write domain. This is avoidable and would lead to some optimization.
> Though this is not a hypothetical case, but is unlikely to occur often.
> The aim is to detect changes to the backing storage whilst the
> data is potentially in the CPU cache, and only clflush in those case.
 
Testcase: igt/gem_concurrent_blit 
Testcase: igt/benchmarks/gem_set_domain
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h | 1 +
>  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
>  2 files changed, 9 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index df9316f..fedb71d 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
>  	unsigned long gt_ro:1;
>  	unsigned int cache_level:3;
>  	unsigned int cache_dirty:1;
> +	unsigned int cache_clean:1;
>  
>  	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 19c282b..a13ffd4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
>  	trace_i915_gem_object_clflush(obj);
>  	drm_clflush_sg(obj->pages);
>  	obj->cache_dirty = false;
> +	obj->cache_clean = true;
>  
>  	return true;
>  }
> @@ -3982,7 +3983,13 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  
>  	/* Flush the CPU cache if it's still invalid. */
>  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> -		i915_gem_clflush_object(obj, false);
> +		/* Invalidation not needed as there should not be any data in
> +		 * CPU cache lines for this object, since clflush would have
> +		 * happened when the object last moved out of CPU write domain.
> +		 */

/* If an object is moved out of the CPU domain following a CPU write
 * and before a GPU or GTT write, we will clflush it out of the CPU cache,
 * and mark the cache as clean. As the object has not been accessed on the CPU
 * since (i.e. the CPU cache is still clean and it is out of the CPU domain),
 * we know that no CPU cache line contains stale data and so we can skip
 * invalidating the CPU cache in preparing to read from the object.
 */

Marginally more verbose in stating the sequence of events for which we
can ignore the clflush invalidate.

Please Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> as I trust his
criticisms here.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-24 10:04 ` Ville Syrjälä
@ 2015-11-24 18:14   ` Daniel Vetter
  2015-11-24 22:39     ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2015-11-24 18:14 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: akash.goel, intel-gfx

On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> > From: Akash Goel <akash.goel@intel.com>
> > 
> > When the object is moved out of CPU read domain, the cachelines
> > are not invalidated immediately. The invalidation is deferred till
> > next time the object is brought back into CPU read domain.
> > But the invalidation is done unconditionally, i.e. even for the case
> > where the cachelines were flushed previously, when the object moved out
> > of CPU write domain. This is avoidable and would lead to some optimization.
> > Though this is not a hypothetical case, but is unlikely to occur often.
> > The aim is to detect changes to the backing storage whilst the
> > data is potentially in the CPU cache, and only clflush in those case.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h | 1 +
> >  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> >  2 files changed, 9 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index df9316f..fedb71d 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> >  	unsigned long gt_ro:1;
> >  	unsigned int cache_level:3;
> >  	unsigned int cache_dirty:1;
> > +	unsigned int cache_clean:1;
> 
> So now we have cache_dirty and cache_clean which seems redundant,
> except somehow cache_dirty != !cache_clean?

We also have read_domains & DOMAIN_CPU. Which is which?

Documentation for this stuff would be awesome, and probably should be
included in this patch. With kerneldoc in 4.4 we can do multi-paragraph
kerneldoc comments for individual struct members.
-Daniel

> 
> >  
> >  	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
> >  
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 19c282b..a13ffd4 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
> >  	trace_i915_gem_object_clflush(obj);
> >  	drm_clflush_sg(obj->pages);
> >  	obj->cache_dirty = false;
> > +	obj->cache_clean = true;
> >  
> >  	return true;
> >  }
> > @@ -3982,7 +3983,13 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> >  
> >  	/* Flush the CPU cache if it's still invalid. */
> >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > -		i915_gem_clflush_object(obj, false);
> > +		/* Invalidation not needed as there should not be any data in
> > +		 * CPU cache lines for this object, since clflush would have
> > +		 * happened when the object last moved out of CPU write domain.
> > +		 */
> > +		if (!obj->cache_clean)
> > +			i915_gem_clflush_object(obj, false);
> > +		obj->cache_clean = false;
> >  
> >  		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
> >  	}
> > -- 
> > 1.9.2
> > 
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@lists.freedesktop.org
> > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
> 
> -- 
> Ville Syrjälä
> Intel OTC
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-24 18:14   ` Daniel Vetter
@ 2015-11-24 22:39     ` Chris Wilson
  2015-11-25  5:29       ` [PATCH v2] " akash.goel
                         ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Chris Wilson @ 2015-11-24 22:39 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: akash.goel, intel-gfx

On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
> On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> > On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> > > From: Akash Goel <akash.goel@intel.com>
> > > 
> > > When the object is moved out of CPU read domain, the cachelines
> > > are not invalidated immediately. The invalidation is deferred till
> > > next time the object is brought back into CPU read domain.
> > > But the invalidation is done unconditionally, i.e. even for the case
> > > where the cachelines were flushed previously, when the object moved out
> > > of CPU write domain. This is avoidable and would lead to some optimization.
> > > Though this is not a hypothetical case, but is unlikely to occur often.
> > > The aim is to detect changes to the backing storage whilst the
> > > data is potentially in the CPU cache, and only clflush in those case.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > ---
> > >  drivers/gpu/drm/i915/i915_drv.h | 1 +
> > >  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > index df9316f..fedb71d 100644
> > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> > >  	unsigned long gt_ro:1;
> > >  	unsigned int cache_level:3;
> > >  	unsigned int cache_dirty:1;
> > > +	unsigned int cache_clean:1;
> > 
> > So now we have cache_dirty and cache_clean which seems redundant,
> > except somehow cache_dirty != !cache_clean?

Exactly, not entirely redundant. I did think something along MESI lines
would be useful, but that didn't capture the different meanings we
employ.

cache_dirty tracks whether we have been eliding the clflush.

cache_clean tracks whether we know the cache has been completely
clflushed.

(cache_clean implies !cache_dirty, but
!cache_clean does not imply cache_dirty)

> We also have read_domains & DOMAIN_CPU. Which is which?

DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
clflush elision above).

DOMAIN_CPU implies !cache_clean

and even

cache_clean implies !DOMAIN_CPU

but

!DOMAIN_CPU does not imply cache_clean
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v2] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-24 22:39     ` Chris Wilson
@ 2015-11-25  5:29       ` akash.goel
  2015-11-25  9:21       ` [PATCH] " Daniel Vetter
  2015-11-25 11:02       ` Ville Syrjälä
  2 siblings, 0 replies; 25+ messages in thread
From: akash.goel @ 2015-11-25  5:29 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
    Added doc for 'cache_clean' field (Daniel)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem.c | 14 +++++++++++++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index df9316f..b63d3ab 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2099,6 +2099,15 @@ struct drm_i915_gem_object {
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
 
+	/*
+	 * Tracks if the CPU cache has been completely clflushed.
+	 * !cache_clean does not imply cache_dirty (there is some data in the
+	 * CPU cachelines, but has not been dirtied), but cache_clean
+	 * does imply !cache_dirty (no data in cachelines, so not dirty also).
+	 * Actually cache_dirty tracks whether we have been omitting clflushes.
+	 */
+	unsigned int cache_clean:1;
+
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
 	unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 19c282b..2b6a38d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	trace_i915_gem_object_clflush(obj);
 	drm_clflush_sg(obj->pages);
 	obj->cache_dirty = false;
+	obj->cache_clean = true;
 
 	return true;
 }
@@ -3982,7 +3983,18 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* If an object is moved out of the CPU domain following a
+		 * CPU write and before a GPU or GTT write, we will clflush
+		 * it out of the CPU cache, and mark the cache as clean.
+		 * As the object has not been accessed on the CPU since
+		 * (i.e. the CPU cache is still clean and it is out of the CPU
+		 * domain), we know that no CPU cache line contains stale data
+		 * and so we can skip invalidating the CPU cache in preparing
+		 * to read from the object.
+		 */
+		if (!obj->cache_clean)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_clean = false;
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-24 22:39     ` Chris Wilson
  2015-11-25  5:29       ` [PATCH v2] " akash.goel
@ 2015-11-25  9:21       ` Daniel Vetter
  2015-11-25  9:27         ` Goel, Akash
  2015-11-25 11:02       ` Ville Syrjälä
  2 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2015-11-25  9:21 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Ville Syrjälä,
	akash.goel, intel-gfx

On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
> On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
> > On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> > > On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> > > > From: Akash Goel <akash.goel@intel.com>
> > > > 
> > > > When the object is moved out of CPU read domain, the cachelines
> > > > are not invalidated immediately. The invalidation is deferred till
> > > > next time the object is brought back into CPU read domain.
> > > > But the invalidation is done unconditionally, i.e. even for the case
> > > > where the cachelines were flushed previously, when the object moved out
> > > > of CPU write domain. This is avoidable and would lead to some optimization.
> > > > Though this is not a hypothetical case, but is unlikely to occur often.
> > > > The aim is to detect changes to the backing storage whilst the
> > > > data is potentially in the CPU cache, and only clflush in those case.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_drv.h | 1 +
> > > >  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> > > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > index df9316f..fedb71d 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> > > >  	unsigned long gt_ro:1;
> > > >  	unsigned int cache_level:3;
> > > >  	unsigned int cache_dirty:1;
> > > > +	unsigned int cache_clean:1;
> > > 
> > > So now we have cache_dirty and cache_clean which seems redundant,
> > > except somehow cache_dirty != !cache_clean?
> 
> Exactly, not entirely redundant. I did think something along MESI lines
> would be useful, but that didn't capture the different meanings we
> employ.
> 
> cache_dirty tracks whether we have been eliding the clflush.
> 
> cache_clean tracks whether we know the cache has been completely
> clflushed.
> 
> (cache_clean implies !cache_dirty, but
> !cache_clean does not imply cache_dirty)
> 
> > We also have read_domains & DOMAIN_CPU. Which is which?
> 
> DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
> clflush elision above).
> 
> DOMAIN_CPU implies !cache_clean
> 
> and even
> 
> cache_clean implies !DOMAIN_CPU
> 
> but
> 
> !DOMAIN_CPU does not imply cache_clean

All the above should be in the kerneldoc (per-struct-member comments
please) of drm_i915_gem_object. Akash, can you please amend your patch?
And please make sure we do include that kerneldoc somewhere ... might need
an upfront patch to do that, for just drm_i915_gem_object.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-25  9:21       ` [PATCH] " Daniel Vetter
@ 2015-11-25  9:27         ` Goel, Akash
  2015-11-25 10:00           ` Daniel Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Goel, Akash @ 2015-11-25  9:27 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, Ville Syrjälä
  Cc: intel-gfx, akash.goel



On 11/25/2015 2:51 PM, Daniel Vetter wrote:
> On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
>> On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
>>> On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
>>>> On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
>>>>> From: Akash Goel <akash.goel@intel.com>
>>>>>
>>>>> When the object is moved out of CPU read domain, the cachelines
>>>>> are not invalidated immediately. The invalidation is deferred till
>>>>> next time the object is brought back into CPU read domain.
>>>>> But the invalidation is done unconditionally, i.e. even for the case
>>>>> where the cachelines were flushed previously, when the object moved out
>>>>> of CPU write domain. This is avoidable and would lead to some optimization.
>>>>> Though this is not a hypothetical case, but is unlikely to occur often.
>>>>> The aim is to detect changes to the backing storage whilst the
>>>>> data is potentially in the CPU cache, and only clflush in those case.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_drv.h | 1 +
>>>>>   drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
>>>>>   2 files changed, 9 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index df9316f..fedb71d 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
>>>>>   	unsigned long gt_ro:1;
>>>>>   	unsigned int cache_level:3;
>>>>>   	unsigned int cache_dirty:1;
>>>>> +	unsigned int cache_clean:1;
>>>>
>>>> So now we have cache_dirty and cache_clean which seems redundant,
>>>> except somehow cache_dirty != !cache_clean?
>>
>> Exactly, not entirely redundant. I did think something along MESI lines
>> would be useful, but that didn't capture the different meanings we
>> employ.
>>
>> cache_dirty tracks whether we have been eliding the clflush.
>>
>> cache_clean tracks whether we know the cache has been completely
>> clflushed.
>>
>> (cache_clean implies !cache_dirty, but
>> !cache_clean does not imply cache_dirty)
>>
>>> We also have read_domains & DOMAIN_CPU. Which is which?
>>
>> DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
>> clflush elision above).
>>
>> DOMAIN_CPU implies !cache_clean
>>
>> and even
>>
>> cache_clean implies !DOMAIN_CPU
>>
>> but
>>
>> !DOMAIN_CPU does not imply cache_clean
>
> All the above should be in the kerneldoc (per-struct-member comments
> please) of drm_i915_gem_object. Akash, can you please amend your patch?
> And please make sure we do include that kerneldoc somewhere ... might need
> an upfront patch to do that, for just drm_i915_gem_object.

I floated the amended patch, earlier today,
http://lists.freedesktop.org/archives/intel-gfx/2015-November/081194.html.
Please kindly check that.

Best regards
Akash


> -Daniel
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-25  9:27         ` Goel, Akash
@ 2015-11-25 10:00           ` Daniel Vetter
  2015-11-30  6:24             ` Goel, Akash
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2015-11-25 10:00 UTC (permalink / raw)
  To: Goel, Akash; +Cc: intel-gfx

On Wed, Nov 25, 2015 at 02:57:47PM +0530, Goel, Akash wrote:
> 
> 
> On 11/25/2015 2:51 PM, Daniel Vetter wrote:
> >On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
> >>On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
> >>>On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> >>>>On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> >>>>>From: Akash Goel <akash.goel@intel.com>
> >>>>>
> >>>>>When the object is moved out of CPU read domain, the cachelines
> >>>>>are not invalidated immediately. The invalidation is deferred till
> >>>>>next time the object is brought back into CPU read domain.
> >>>>>But the invalidation is done unconditionally, i.e. even for the case
> >>>>>where the cachelines were flushed previously, when the object moved out
> >>>>>of CPU write domain. This is avoidable and would lead to some optimization.
> >>>>>Though this is not a hypothetical case, but is unlikely to occur often.
> >>>>>The aim is to detect changes to the backing storage whilst the
> >>>>>data is potentially in the CPU cache, and only clflush in those case.
> >>>>>
> >>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>>>>---
> >>>>>  drivers/gpu/drm/i915/i915_drv.h | 1 +
> >>>>>  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> >>>>>  2 files changed, 9 insertions(+), 1 deletion(-)
> >>>>>
> >>>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>index df9316f..fedb71d 100644
> >>>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>@@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> >>>>>  	unsigned long gt_ro:1;
> >>>>>  	unsigned int cache_level:3;
> >>>>>  	unsigned int cache_dirty:1;
> >>>>>+	unsigned int cache_clean:1;
> >>>>
> >>>>So now we have cache_dirty and cache_clean which seems redundant,
> >>>>except somehow cache_dirty != !cache_clean?
> >>
> >>Exactly, not entirely redundant. I did think something along MESI lines
> >>would be useful, but that didn't capture the different meanings we
> >>employ.
> >>
> >>cache_dirty tracks whether we have been eliding the clflush.
> >>
> >>cache_clean tracks whether we know the cache has been completely
> >>clflushed.
> >>
> >>(cache_clean implies !cache_dirty, but
> >>!cache_clean does not imply cache_dirty)
> >>
> >>>We also have read_domains & DOMAIN_CPU. Which is which?
> >>
> >>DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
> >>clflush elision above).
> >>
> >>DOMAIN_CPU implies !cache_clean
> >>
> >>and even
> >>
> >>cache_clean implies !DOMAIN_CPU
> >>
> >>but
> >>
> >>!DOMAIN_CPU does not imply cache_clean
> >
> >All the above should be in the kerneldoc (per-struct-member comments
> >please) of drm_i915_gem_object. Akash, can you please amend your patch?
> >And please make sure we do include that kerneldoc somewhere ... might need
> >an upfront patch to do that, for just drm_i915_gem_object.
> 
> I floated the amended patch, earlier today,
> http://lists.freedesktop.org/archives/intel-gfx/2015-November/081194.html.
> Please kindly check that.

Already done and replied here because I think this should be lifted to
kerneldoc for the structure itself. That's why I replied here ;-)
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-24 22:39     ` Chris Wilson
  2015-11-25  5:29       ` [PATCH v2] " akash.goel
  2015-11-25  9:21       ` [PATCH] " Daniel Vetter
@ 2015-11-25 11:02       ` Ville Syrjälä
  2015-11-25 17:28         ` Chris Wilson
  2 siblings, 1 reply; 25+ messages in thread
From: Ville Syrjälä @ 2015-11-25 11:02 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, akash.goel, intel-gfx

On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
> On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
> > On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> > > On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> > > > From: Akash Goel <akash.goel@intel.com>
> > > > 
> > > > When the object is moved out of CPU read domain, the cachelines
> > > > are not invalidated immediately. The invalidation is deferred till
> > > > next time the object is brought back into CPU read domain.
> > > > But the invalidation is done unconditionally, i.e. even for the case
> > > > where the cachelines were flushed previously, when the object moved out
> > > > of CPU write domain. This is avoidable and would lead to some optimization.
> > > > Though this is not a hypothetical case, but is unlikely to occur often.
> > > > The aim is to detect changes to the backing storage whilst the
> > > > data is potentially in the CPU cache, and only clflush in those case.
> > > > 
> > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > > ---
> > > >  drivers/gpu/drm/i915/i915_drv.h | 1 +
> > > >  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> > > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > index df9316f..fedb71d 100644
> > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> > > >  	unsigned long gt_ro:1;
> > > >  	unsigned int cache_level:3;
> > > >  	unsigned int cache_dirty:1;
> > > > +	unsigned int cache_clean:1;
> > > 
> > > So now we have cache_dirty and cache_clean which seems redundant,
> > > except somehow cache_dirty != !cache_clean?
> 
> Exactly, not entirely redundant. I did think something along MESI lines
> would be useful, but that didn't capture the different meanings we
> employ.
> 
> cache_dirty tracks whether we have been eliding the clflush.
> 
> cache_clean tracks whether we know the cache has been completely
> clflushed.

Can we know that with speculative prefetching and whatnot?

> 
> (cache_clean implies !cache_dirty, but
> !cache_clean does not imply cache_dirty)
> 
> > We also have read_domains & DOMAIN_CPU. Which is which?
> 
> DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
> clflush elision above).
> 
> DOMAIN_CPU implies !cache_clean
> 
> and even
> 
> cache_clean implies !DOMAIN_CPU
> 
> but
> 
> !DOMAIN_CPU does not imply cache_clean
> -Chris
> 
> -- 
> Chris Wilson, Intel Open Source Technology Centre

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-25 11:02       ` Ville Syrjälä
@ 2015-11-25 17:28         ` Chris Wilson
  2015-11-26  3:39           ` Goel, Akash
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2015-11-25 17:28 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: akash.goel, intel-gfx

On Wed, Nov 25, 2015 at 01:02:20PM +0200, Ville Syrjälä wrote:
> On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
> > On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
> > > On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> > > > On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> > > > > From: Akash Goel <akash.goel@intel.com>
> > > > > 
> > > > > When the object is moved out of CPU read domain, the cachelines
> > > > > are not invalidated immediately. The invalidation is deferred till
> > > > > next time the object is brought back into CPU read domain.
> > > > > But the invalidation is done unconditionally, i.e. even for the case
> > > > > where the cachelines were flushed previously, when the object moved out
> > > > > of CPU write domain. This is avoidable and would lead to some optimization.
> > > > > Though this is not a hypothetical case, but is unlikely to occur often.
> > > > > The aim is to detect changes to the backing storage whilst the
> > > > > data is potentially in the CPU cache, and only clflush in those case.
> > > > > 
> > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > > > Signed-off-by: Akash Goel <akash.goel@intel.com>
> > > > > ---
> > > > >  drivers/gpu/drm/i915/i915_drv.h | 1 +
> > > > >  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> > > > >  2 files changed, 9 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > > > > index df9316f..fedb71d 100644
> > > > > --- a/drivers/gpu/drm/i915/i915_drv.h
> > > > > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > > > > @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> > > > >  	unsigned long gt_ro:1;
> > > > >  	unsigned int cache_level:3;
> > > > >  	unsigned int cache_dirty:1;
> > > > > +	unsigned int cache_clean:1;
> > > > 
> > > > So now we have cache_dirty and cache_clean which seems redundant,
> > > > except somehow cache_dirty != !cache_clean?
> > 
> > Exactly, not entirely redundant. I did think something along MESI lines
> > would be useful, but that didn't capture the different meanings we
> > employ.
> > 
> > cache_dirty tracks whether we have been eliding the clflush.
> > 
> > cache_clean tracks whether we know the cache has been completely
> > clflushed.
> 
> Can we know that with speculative prefetching and whatnot?

"The memory attribute of the page containing the affected line has no
effect on the behavior of this instruction. It should be noted that
processors are free to speculative fetch and cache data from system
memory regions assigned a memory-type allowing for speculative reads
(i.e. WB, WC, WT memory types). The Streaming SIMD Extensions PREFETCHh
instruction is considered a hint to this speculative behavior. Because
this speculative fetching can occur at any time and is not tied to
instruction execution, CLFLUSH is not ordered with respect to PREFETCHh
or any of the speculative fetching mechanisms (that is, data could be
speculative loaded into the cache just before, during, or after the
execution of a CLFLUSH to that cache line)."

which taken to the extreme means that we can't get away with this trick.

If we can at least guarantee that such speculation can't extend beyond
a page boundary that will be enough to assert that the patch is valid.

Hopefully someone knows a CPU guru or two.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-25 17:28         ` Chris Wilson
@ 2015-11-26  3:39           ` Goel, Akash
  2015-11-26 10:57             ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Goel, Akash @ 2015-11-26  3:39 UTC (permalink / raw)
  To: Chris Wilson, Ville Syrjälä, Daniel Vetter
  Cc: intel-gfx, akash.goel



On 11/25/2015 10:58 PM, Chris Wilson wrote:
> On Wed, Nov 25, 2015 at 01:02:20PM +0200, Ville Syrjälä wrote:
>> On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
>>> On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
>>>> On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
>>>>> On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
>>>>>> From: Akash Goel <akash.goel@intel.com>
>>>>>>
>>>>>> When the object is moved out of CPU read domain, the cachelines
>>>>>> are not invalidated immediately. The invalidation is deferred till
>>>>>> next time the object is brought back into CPU read domain.
>>>>>> But the invalidation is done unconditionally, i.e. even for the case
>>>>>> where the cachelines were flushed previously, when the object moved out
>>>>>> of CPU write domain. This is avoidable and would lead to some optimization.
>>>>>> Though this is not a hypothetical case, but is unlikely to occur often.
>>>>>> The aim is to detect changes to the backing storage whilst the
>>>>>> data is potentially in the CPU cache, and only clflush in those case.
>>>>>>
>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>>>> ---
>>>>>>   drivers/gpu/drm/i915/i915_drv.h | 1 +
>>>>>>   drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
>>>>>>   2 files changed, 9 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> index df9316f..fedb71d 100644
>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>>> @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
>>>>>>   	unsigned long gt_ro:1;
>>>>>>   	unsigned int cache_level:3;
>>>>>>   	unsigned int cache_dirty:1;
>>>>>> +	unsigned int cache_clean:1;
>>>>>
>>>>> So now we have cache_dirty and cache_clean which seems redundant,
>>>>> except somehow cache_dirty != !cache_clean?
>>>
>>> Exactly, not entirely redundant. I did think something along MESI lines
>>> would be useful, but that didn't capture the different meanings we
>>> employ.
>>>
>>> cache_dirty tracks whether we have been eliding the clflush.
>>>
>>> cache_clean tracks whether we know the cache has been completely
>>> clflushed.
>>
>> Can we know that with speculative prefetching and whatnot?
>
> "The memory attribute of the page containing the affected line has no
> effect on the behavior of this instruction. It should be noted that
> processors are free to speculative fetch and cache data from system
> memory regions assigned a memory-type allowing for speculative reads
> (i.e. WB, WC, WT memory types). The Streaming SIMD Extensions PREFETCHh
> instruction is considered a hint to this speculative behavior. Because
> this speculative fetching can occur at any time and is not tied to
> instruction execution, CLFLUSH is not ordered with respect to PREFETCHh
> or any of the speculative fetching mechanisms (that is, data could be
> speculative loaded into the cache just before, during, or after the
> execution of a CLFLUSH to that cache line)."
>
> which taken to the extreme means that we can't get away with this trick.
>
> If we can at least guarantee that such speculation can't extend beyond
> a page boundary that will be enough to assert that the patch is valid.
>
> Hopefully someone knows a CPU guru or two.

Found some relevant info at the link https://lwn.net/Articles/255364/

An excerpt from the same link
Hardware Prefetching
"Prefetching has one big weakness: it cannot cross page boundaries. The 
reason should be obvious when one realizes that the CPUs support demand 
paging. If the prefetcher were allowed to cross page boundaries, the 
access might trigger an OS event to make the page available. This by 
itself can be bad, especially for performance. What is worse is that the 
prefetcher does not know about the semantics of the program or the OS 
itself. It might therefore prefetch pages which, in real life, never 
would be requested. That means the prefetcher would run past the end of 
the memory region the processor accessed in a recognizable pattern 
before. This is not only possible, it is very likely. If the processor, 
as a side effect of a prefetch, triggered a request for such a page the 
OS might even be completely thrown off its tracks if such a request 
could never otherwise happen.

It is therefore important to realize that, regardless of how good the 
prefetcher is at predicting the pattern, the program will experience 
cache misses at page boundaries unless it explicitly prefetches or reads 
from the new page. This is another reason to optimize the layout of data 
as described in Section 6.2 to minimize cache pollution by keeping 
unrelated data out.

Because of this page limitation the processors do not have terribly 
sophisticated logic to recognize prefetch patterns. With the still 
predominant 4k page size there is only so much which makes sense. The 
address range in which strides are recognized has been increased over 
the years, but it probably does not make much sense to go beyond the 512 
byte window which is often used today. Currently prefetch units do not 
recognize non-linear access pattern"


Best regards
Akash

> -Chris
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-26  3:39           ` Goel, Akash
@ 2015-11-26 10:57             ` Chris Wilson
  2015-11-30  7:11               ` [PATCH v3] " akash.goel
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2015-11-26 10:57 UTC (permalink / raw)
  To: Goel, Akash; +Cc: intel-gfx

On Thu, Nov 26, 2015 at 09:09:37AM +0530, Goel, Akash wrote:
> 
> 
> On 11/25/2015 10:58 PM, Chris Wilson wrote:
> >On Wed, Nov 25, 2015 at 01:02:20PM +0200, Ville Syrjälä wrote:
> >>On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
> >>>On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
> >>>>On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> >>>>>On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> >>>>>>From: Akash Goel <akash.goel@intel.com>
> >>>>>>
> >>>>>>When the object is moved out of CPU read domain, the cachelines
> >>>>>>are not invalidated immediately. The invalidation is deferred till
> >>>>>>next time the object is brought back into CPU read domain.
> >>>>>>But the invalidation is done unconditionally, i.e. even for the case
> >>>>>>where the cachelines were flushed previously, when the object moved out
> >>>>>>of CPU write domain. This is avoidable and would lead to some optimization.
> >>>>>>Though this is not a hypothetical case, but is unlikely to occur often.
> >>>>>>The aim is to detect changes to the backing storage whilst the
> >>>>>>data is potentially in the CPU cache, and only clflush in those case.
> >>>>>>
> >>>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>>>>>---
> >>>>>>  drivers/gpu/drm/i915/i915_drv.h | 1 +
> >>>>>>  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> >>>>>>  2 files changed, 9 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>>index df9316f..fedb71d 100644
> >>>>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>>@@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> >>>>>>  	unsigned long gt_ro:1;
> >>>>>>  	unsigned int cache_level:3;
> >>>>>>  	unsigned int cache_dirty:1;
> >>>>>>+	unsigned int cache_clean:1;
> >>>>>
> >>>>>So now we have cache_dirty and cache_clean which seems redundant,
> >>>>>except somehow cache_dirty != !cache_clean?
> >>>
> >>>Exactly, not entirely redundant. I did think something along MESI lines
> >>>would be useful, but that didn't capture the different meanings we
> >>>employ.
> >>>
> >>>cache_dirty tracks whether we have been eliding the clflush.
> >>>
> >>>cache_clean tracks whether we know the cache has been completely
> >>>clflushed.
> >>
> >>Can we know that with speculative prefetching and whatnot?
> >
> >"The memory attribute of the page containing the affected line has no
> >effect on the behavior of this instruction. It should be noted that
> >processors are free to speculative fetch and cache data from system
> >memory regions assigned a memory-type allowing for speculative reads
> >(i.e. WB, WC, WT memory types). The Streaming SIMD Extensions PREFETCHh
> >instruction is considered a hint to this speculative behavior. Because
> >this speculative fetching can occur at any time and is not tied to
> >instruction execution, CLFLUSH is not ordered with respect to PREFETCHh
> >or any of the speculative fetching mechanisms (that is, data could be
> >speculative loaded into the cache just before, during, or after the
> >execution of a CLFLUSH to that cache line)."
> >
> >which taken to the extreme means that we can't get away with this trick.
> >
> >If we can at least guarantee that such speculation can't extend beyond
> >a page boundary that will be enough to assert that the patch is valid.
> >
> >Hopefully someone knows a CPU guru or two.
> 
> Found some relevant info at the link https://lwn.net/Articles/255364/
> 
> An excerpt from the same link
> Hardware Prefetching
> "Prefetching has one big weakness: it cannot cross page boundaries.
> The reason should be obvious when one realizes that the CPUs support
> demand paging. If the prefetcher were allowed to cross page
> boundaries, the access might trigger an OS event to make the page
> available. This by itself can be bad, especially for performance.
> What is worse is that the prefetcher does not know about the
> semantics of the program or the OS itself. It might therefore
> prefetch pages which, in real life, never would be requested. That
> means the prefetcher would run past the end of the memory region the
> processor accessed in a recognizable pattern before. This is not
> only possible, it is very likely. If the processor, as a side effect
> of a prefetch, triggered a request for such a page the OS might even
> be completely thrown off its tracks if such a request could never
> otherwise happen.
> 
> It is therefore important to realize that, regardless of how good
> the prefetcher is at predicting the pattern, the program will
> experience cache misses at page boundaries unless it explicitly
> prefetches or reads from the new page. This is another reason to
> optimize the layout of data as described in Section 6.2 to minimize
> cache pollution by keeping unrelated data out.
> 
> Because of this page limitation the processors do not have terribly
> sophisticated logic to recognize prefetch patterns. With the still
> predominant 4k page size there is only so much which makes sense.
> The address range in which strides are recognized has been increased
> over the years, but it probably does not make much sense to go
> beyond the 512 byte window which is often used today. Currently
> prefetch units do not recognize non-linear access pattern"

How best to summarise? Add something like
 * ...
 * After clflushing we know that this object cannot be in the
 * CPU cache, nor can it be speculatively loaded into the CPU
 * cache as our objects are page-aligned (and speculation cannot
 * cross page boundaries). Whilst this flag is set, we know that
 * any future access to the object's pages will miss the stale
 * cache and have to be serviced from main memory, i.e. we do
 * not need another clflush to invalidate the CPU cache.
?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-25 10:00           ` Daniel Vetter
@ 2015-11-30  6:24             ` Goel, Akash
  2015-11-30  8:15               ` Daniel Vetter
  0 siblings, 1 reply; 25+ messages in thread
From: Goel, Akash @ 2015-11-30  6:24 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, akash.goel



On 11/25/2015 3:30 PM, Daniel Vetter wrote:
> On Wed, Nov 25, 2015 at 02:57:47PM +0530, Goel, Akash wrote:
>>
>>
>> On 11/25/2015 2:51 PM, Daniel Vetter wrote:
>>> On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
>>>> On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
>>>>> On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
>>>>>> On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
>>>>>>> From: Akash Goel <akash.goel@intel.com>
>>>>>>>
>>>>>>> When the object is moved out of CPU read domain, the cachelines
>>>>>>> are not invalidated immediately. The invalidation is deferred till
>>>>>>> next time the object is brought back into CPU read domain.
>>>>>>> But the invalidation is done unconditionally, i.e. even for the case
>>>>>>> where the cachelines were flushed previously, when the object moved out
>>>>>>> of CPU write domain. This is avoidable and would lead to some optimization.
>>>>>>> Though this is not a hypothetical case, but is unlikely to occur often.
>>>>>>> The aim is to detect changes to the backing storage whilst the
>>>>>>> data is potentially in the CPU cache, and only clflush in those case.
>>>>>>>
>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>>>>> ---
>>>>>>>   drivers/gpu/drm/i915/i915_drv.h | 1 +
>>>>>>>   drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
>>>>>>>   2 files changed, 9 insertions(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>>>> index df9316f..fedb71d 100644
>>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>>>> @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
>>>>>>>   	unsigned long gt_ro:1;
>>>>>>>   	unsigned int cache_level:3;
>>>>>>>   	unsigned int cache_dirty:1;
>>>>>>> +	unsigned int cache_clean:1;
>>>>>>
>>>>>> So now we have cache_dirty and cache_clean which seems redundant,
>>>>>> except somehow cache_dirty != !cache_clean?
>>>>
>>>> Exactly, not entirely redundant. I did think something along MESI lines
>>>> would be useful, but that didn't capture the different meanings we
>>>> employ.
>>>>
>>>> cache_dirty tracks whether we have been eliding the clflush.
>>>>
>>>> cache_clean tracks whether we know the cache has been completely
>>>> clflushed.
>>>>
>>>> (cache_clean implies !cache_dirty, but
>>>> !cache_clean does not imply cache_dirty)
>>>>
>>>>> We also have read_domains & DOMAIN_CPU. Which is which?
>>>>
>>>> DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
>>>> clflush elision above).
>>>>
>>>> DOMAIN_CPU implies !cache_clean
>>>>
>>>> and even
>>>>
>>>> cache_clean implies !DOMAIN_CPU
>>>>
>>>> but
>>>>
>>>> !DOMAIN_CPU does not imply cache_clean
>>>
>>> All the above should be in the kerneldoc (per-struct-member comments
>>> please) of drm_i915_gem_object. Akash, can you please amend your patch?
>>> And please make sure we do include that kerneldoc somewhere ... might need
>>> an upfront patch to do that, for just drm_i915_gem_object.
>>
>> I floated the amended patch, earlier today,
>> http://lists.freedesktop.org/archives/intel-gfx/2015-November/081194.html.
>> Please kindly check that.
>
> Already done and replied here because I think this should be lifted to
> kerneldoc for the structure itself. That's why I replied here ;-)
> -Daniel
Hi Daniel,

I think the patch to provide a kernel-doc for just the 
drm_i915_gem_object structure can be submitted independently of this 
patch. The kernel-doc part can be done as a follow up patch.

For the current patch, have added the per-struct-member comments for the 
'cache_clean' field.

Best regards
Akash

>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-26 10:57             ` Chris Wilson
@ 2015-11-30  7:11               ` akash.goel
  2015-12-01 12:34                 ` Ville Syrjälä
  0 siblings, 1 reply; 25+ messages in thread
From: akash.goel @ 2015-11-30  7:11 UTC (permalink / raw)
  To: intel-gfx; +Cc: Akash Goel

From: Akash Goel <akash.goel@intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
    Added doc for 'cache_clean' field (Daniel)

v3: Updated the comment to assuage an apprehension regarding the
    speculative-prefetching behavior of HW (Ville/Chris)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 11ae5a5..f97795e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2100,6 +2100,15 @@ struct drm_i915_gem_object {
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
 
+	/*
+	 * Tracks if the CPU cache has been completely clflushed.
+	 * !cache_clean does not imply cache_dirty (there is some data in the
+	 * CPU cachelines, but has not been dirtied), but cache_clean
+	 * does imply !cache_dirty (no data in cachelines, so not dirty also).
+	 * Actually cache_dirty tracks whether we have been omitting clflushes.
+	 */
+	unsigned int cache_clean:1;
+
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
 	unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33adc8f..7376be8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	trace_i915_gem_object_clflush(obj);
 	drm_clflush_sg(obj->pages);
 	obj->cache_dirty = false;
+	obj->cache_clean = true;
 
 	return true;
 }
@@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* If an object is moved out of the CPU domain following a
+		 * CPU write and before a GPU or GTT write, we will clflush
+		 * it out of the CPU cache, and mark the cache as clean.
+		 * After clflushing we know that this object cannot be in the
+		 * CPU cache, nor can it be speculatively loaded into the CPU
+		 * cache as our objects are page-aligned (& speculation cannot
+		 * cross page boundaries). Whilst this flag is set, we know
+		 * that any future access to the object's pages will miss the
+		 * stale cache and have to be serviced from main memory, i.e.
+		 * we do not need another clflush to invalidate the CPU cache
+		 * in preparing to read from the object.
+		 */
+		if (!obj->cache_clean)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_clean = false;
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-30  6:24             ` Goel, Akash
@ 2015-11-30  8:15               ` Daniel Vetter
  2015-12-01 12:07                 ` Goel, Akash
  0 siblings, 1 reply; 25+ messages in thread
From: Daniel Vetter @ 2015-11-30  8:15 UTC (permalink / raw)
  To: Goel, Akash; +Cc: intel-gfx

On Mon, Nov 30, 2015 at 11:54:14AM +0530, Goel, Akash wrote:
> 
> 
> On 11/25/2015 3:30 PM, Daniel Vetter wrote:
> >On Wed, Nov 25, 2015 at 02:57:47PM +0530, Goel, Akash wrote:
> >>
> >>
> >>On 11/25/2015 2:51 PM, Daniel Vetter wrote:
> >>>On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
> >>>>On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
> >>>>>On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
> >>>>>>On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
> >>>>>>>From: Akash Goel <akash.goel@intel.com>
> >>>>>>>
> >>>>>>>When the object is moved out of CPU read domain, the cachelines
> >>>>>>>are not invalidated immediately. The invalidation is deferred till
> >>>>>>>next time the object is brought back into CPU read domain.
> >>>>>>>But the invalidation is done unconditionally, i.e. even for the case
> >>>>>>>where the cachelines were flushed previously, when the object moved out
> >>>>>>>of CPU write domain. This is avoidable and would lead to some optimization.
> >>>>>>>Though this is not a hypothetical case, but is unlikely to occur often.
> >>>>>>>The aim is to detect changes to the backing storage whilst the
> >>>>>>>data is potentially in the CPU cache, and only clflush in those case.
> >>>>>>>
> >>>>>>>Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>>>>Signed-off-by: Akash Goel <akash.goel@intel.com>
> >>>>>>>---
> >>>>>>>  drivers/gpu/drm/i915/i915_drv.h | 1 +
> >>>>>>>  drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
> >>>>>>>  2 files changed, 9 insertions(+), 1 deletion(-)
> >>>>>>>
> >>>>>>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>>>index df9316f..fedb71d 100644
> >>>>>>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>>>>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>>>>>@@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
> >>>>>>>  	unsigned long gt_ro:1;
> >>>>>>>  	unsigned int cache_level:3;
> >>>>>>>  	unsigned int cache_dirty:1;
> >>>>>>>+	unsigned int cache_clean:1;
> >>>>>>
> >>>>>>So now we have cache_dirty and cache_clean which seems redundant,
> >>>>>>except somehow cache_dirty != !cache_clean?
> >>>>
> >>>>Exactly, not entirely redundant. I did think something along MESI lines
> >>>>would be useful, but that didn't capture the different meanings we
> >>>>employ.
> >>>>
> >>>>cache_dirty tracks whether we have been eliding the clflush.
> >>>>
> >>>>cache_clean tracks whether we know the cache has been completely
> >>>>clflushed.
> >>>>
> >>>>(cache_clean implies !cache_dirty, but
> >>>>!cache_clean does not imply cache_dirty)
> >>>>
> >>>>>We also have read_domains & DOMAIN_CPU. Which is which?
> >>>>
> >>>>DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
> >>>>clflush elision above).
> >>>>
> >>>>DOMAIN_CPU implies !cache_clean
> >>>>
> >>>>and even
> >>>>
> >>>>cache_clean implies !DOMAIN_CPU
> >>>>
> >>>>but
> >>>>
> >>>>!DOMAIN_CPU does not imply cache_clean
> >>>
> >>>All the above should be in the kerneldoc (per-struct-member comments
> >>>please) of drm_i915_gem_object. Akash, can you please amend your patch?
> >>>And please make sure we do include that kerneldoc somewhere ... might need
> >>>an upfront patch to do that, for just drm_i915_gem_object.
> >>
> >>I floated the amended patch, earlier today,
> >>http://lists.freedesktop.org/archives/intel-gfx/2015-November/081194.html.
> >>Please kindly check that.
> >
> >Already done and replied here because I think this should be lifted to
> >kerneldoc for the structure itself. That's why I replied here ;-)
> >-Daniel
> Hi Daniel,
> 
> I think the patch to provide a kernel-doc for just the drm_i915_gem_object
> structure can be submitted independently of this patch. The kernel-doc part
> can be done as a follow up patch.

Imo it should be done first, so that your cache optimization can also
correctly update the documentation.
-Daniel

> 
> For the current patch, have added the per-struct-member comments for the
> 'cache_clean' field.
> 
> Best regards
> Akash
> 
> >

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-30  8:15               ` Daniel Vetter
@ 2015-12-01 12:07                 ` Goel, Akash
  0 siblings, 0 replies; 25+ messages in thread
From: Goel, Akash @ 2015-12-01 12:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, akash.goel



On 11/30/2015 1:45 PM, Daniel Vetter wrote:
> On Mon, Nov 30, 2015 at 11:54:14AM +0530, Goel, Akash wrote:
>>
>>
>> On 11/25/2015 3:30 PM, Daniel Vetter wrote:
>>> On Wed, Nov 25, 2015 at 02:57:47PM +0530, Goel, Akash wrote:
>>>>
>>>>
>>>> On 11/25/2015 2:51 PM, Daniel Vetter wrote:
>>>>> On Tue, Nov 24, 2015 at 10:39:38PM +0000, Chris Wilson wrote:
>>>>>> On Tue, Nov 24, 2015 at 07:14:31PM +0100, Daniel Vetter wrote:
>>>>>>> On Tue, Nov 24, 2015 at 12:04:06PM +0200, Ville Syrjälä wrote:
>>>>>>>> On Tue, Nov 24, 2015 at 03:35:24PM +0530, akash.goel@intel.com wrote:
>>>>>>>>> From: Akash Goel <akash.goel@intel.com>
>>>>>>>>>
>>>>>>>>> When the object is moved out of CPU read domain, the cachelines
>>>>>>>>> are not invalidated immediately. The invalidation is deferred till
>>>>>>>>> next time the object is brought back into CPU read domain.
>>>>>>>>> But the invalidation is done unconditionally, i.e. even for the case
>>>>>>>>> where the cachelines were flushed previously, when the object moved out
>>>>>>>>> of CPU write domain. This is avoidable and would lead to some optimization.
>>>>>>>>> Though this is not a hypothetical case, but is unlikely to occur often.
>>>>>>>>> The aim is to detect changes to the backing storage whilst the
>>>>>>>>> data is potentially in the CPU cache, and only clflush in those case.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>>>> Signed-off-by: Akash Goel <akash.goel@intel.com>
>>>>>>>>> ---
>>>>>>>>>   drivers/gpu/drm/i915/i915_drv.h | 1 +
>>>>>>>>>   drivers/gpu/drm/i915/i915_gem.c | 9 ++++++++-
>>>>>>>>>   2 files changed, 9 insertions(+), 1 deletion(-)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>>>>>> index df9316f..fedb71d 100644
>>>>>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>>>>>> @@ -2098,6 +2098,7 @@ struct drm_i915_gem_object {
>>>>>>>>>   	unsigned long gt_ro:1;
>>>>>>>>>   	unsigned int cache_level:3;
>>>>>>>>>   	unsigned int cache_dirty:1;
>>>>>>>>> +	unsigned int cache_clean:1;
>>>>>>>>
>>>>>>>> So now we have cache_dirty and cache_clean which seems redundant,
>>>>>>>> except somehow cache_dirty != !cache_clean?
>>>>>>
>>>>>> Exactly, not entirely redundant. I did think something along MESI lines
>>>>>> would be useful, but that didn't capture the different meanings we
>>>>>> employ.
>>>>>>
>>>>>> cache_dirty tracks whether we have been eliding the clflush.
>>>>>>
>>>>>> cache_clean tracks whether we know the cache has been completely
>>>>>> clflushed.
>>>>>>
>>>>>> (cache_clean implies !cache_dirty, but
>>>>>> !cache_clean does not imply cache_dirty)
>>>>>>
>>>>>>> We also have read_domains & DOMAIN_CPU. Which is which?
>>>>>>
>>>>>> DOMAIN_CPU implies that the object may be in the cpu cache (modulo the
>>>>>> clflush elision above).
>>>>>>
>>>>>> DOMAIN_CPU implies !cache_clean
>>>>>>
>>>>>> and even
>>>>>>
>>>>>> cache_clean implies !DOMAIN_CPU
>>>>>>
>>>>>> but
>>>>>>
>>>>>> !DOMAIN_CPU does not imply cache_clean
>>>>>
>>>>> All the above should be in the kerneldoc (per-struct-member comments
>>>>> please) of drm_i915_gem_object. Akash, can you please amend your patch?
>>>>> And please make sure we do include that kerneldoc somewhere ... might need
>>>>> an upfront patch to do that, for just drm_i915_gem_object.
>>>>
>>>> I floated the amended patch, earlier today,
>>>> http://lists.freedesktop.org/archives/intel-gfx/2015-November/081194.html.
>>>> Please kindly check that.
>>>
>>> Already done and replied here because I think this should be lifted to
>>> kerneldoc for the structure itself. That's why I replied here ;-)
>>> -Daniel
>> Hi Daniel,
>>
>> I think the patch to provide a kernel-doc for just the drm_i915_gem_object
>> structure can be submitted independently of this patch. The kernel-doc part
>> can be done as a follow up patch.
>
> Imo it should be done first, so that your cache optimization can also
> correctly update the documentation.
> -Daniel

Hi Daniel,

I take an AR to later float a kernel-doc patch for the 
'drm_i915_gem_object' structure, on this pretext can you please consider 
this patch,
http://lists.freedesktop.org/archives/intel-gfx/2015-November/081515.html

Best regards
Akash

>

>>
>> For the current patch, have added the per-struct-member comments for the
>> 'cache_clean' field.
>>
>> Best regards
>> Akash
>>
>>>
>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-11-30  7:11               ` [PATCH v3] " akash.goel
@ 2015-12-01 12:34                 ` Ville Syrjälä
  2015-12-01 13:09                   ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Ville Syrjälä @ 2015-12-01 12:34 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx

On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> When the object is moved out of CPU read domain, the cachelines
> are not invalidated immediately. The invalidation is deferred till
> next time the object is brought back into CPU read domain.
> But the invalidation is done unconditionally, i.e. even for the case
> where the cachelines were flushed previously, when the object moved out
> of CPU write domain. This is avoidable and would lead to some optimization.
> Though this is not a hypothetical case, but is unlikely to occur often.
> The aim is to detect changes to the backing storage whilst the
> data is potentially in the CPU cache, and only clflush in those case.
> 
> v2: Made the comment more verbose (Ville/Chris)
>     Added doc for 'cache_clean' field (Daniel)
> 
> v3: Updated the comment to assuage an apprehension regarding the
>     speculative-prefetching behavior of HW (Ville/Chris)
> 
> Testcase: igt/gem_concurrent_blit
> Testcase: igt/benchmarks/gem_set_domain
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Akash Goel <akash.goel@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
>  drivers/gpu/drm/i915/i915_gem.c | 17 ++++++++++++++++-
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 11ae5a5..f97795e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2100,6 +2100,15 @@ struct drm_i915_gem_object {
>  	unsigned int cache_level:3;
>  	unsigned int cache_dirty:1;
>  
> +	/*
> +	 * Tracks if the CPU cache has been completely clflushed.
> +	 * !cache_clean does not imply cache_dirty (there is some data in the
> +	 * CPU cachelines, but has not been dirtied), but cache_clean
> +	 * does imply !cache_dirty (no data in cachelines, so not dirty also).
> +	 * Actually cache_dirty tracks whether we have been omitting clflushes.
> +	 */
> +	unsigned int cache_clean:1;

Maybe it should be cache_flushed or something? clean really makes me
think !dirty.

> +
>  	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
>  
>  	unsigned int pin_display;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 33adc8f..7376be8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
>  	trace_i915_gem_object_clflush(obj);
>  	drm_clflush_sg(obj->pages);
>  	obj->cache_dirty = false;
> +	obj->cache_clean = true;
>  
>  	return true;
>  }
> @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>  
>  	/* Flush the CPU cache if it's still invalid. */
>  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> -		i915_gem_clflush_object(obj, false);
> +		/* If an object is moved out of the CPU domain following a
> +		 * CPU write and before a GPU or GTT write, we will clflush
> +		 * it out of the CPU cache, and mark the cache as clean.
> +		 * After clflushing we know that this object cannot be in the
> +		 * CPU cache, nor can it be speculatively loaded into the CPU
> +		 * cache as our objects are page-aligned (& speculation cannot
> +		 * cross page boundaries). Whilst this flag is set, we know
> +		 * that any future access to the object's pages will miss the
> +		 * stale cache and have to be serviced from main memory, i.e.
> +		 * we do not need another clflush to invalidate the CPU cache
> +		 * in preparing to read from the object.
> +		 */
> +		if (!obj->cache_clean)
> +			i915_gem_clflush_object(obj, false);
> +		obj->cache_clean = false;

Having the comment here talk about moving stuff out of the cpu domain
made me think there's a bug here (false vs. true). But actually this
code moves it into the cpu domain so it's actually fine, I wonder if
there's a better place for the comment (eg. where we do set
cache_clean=true)?

>  
>  		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
>  	}
> -- 
> 1.9.2

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-12-01 12:34                 ` Ville Syrjälä
@ 2015-12-01 13:09                   ` Chris Wilson
  2015-12-01 13:28                     ` Ville Syrjälä
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2015-12-01 13:09 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: akash.goel, intel-gfx

On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> >  
> >  	/* Flush the CPU cache if it's still invalid. */
> >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > -		i915_gem_clflush_object(obj, false);
> > +		/* If an object is moved out of the CPU domain following a
> > +		 * CPU write and before a GPU or GTT write, we will clflush
> > +		 * it out of the CPU cache, and mark the cache as clean.
> > +		 * After clflushing we know that this object cannot be in the
> > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > +		 * cache as our objects are page-aligned (& speculation cannot
> > +		 * cross page boundaries). Whilst this flag is set, we know
> > +		 * that any future access to the object's pages will miss the
> > +		 * stale cache and have to be serviced from main memory, i.e.
> > +		 * we do not need another clflush to invalidate the CPU cache
> > +		 * in preparing to read from the object.
> > +		 */
> > +		if (!obj->cache_clean)
> > +			i915_gem_clflush_object(obj, false);
> > +		obj->cache_clean = false;
> 
> Having the comment here talk about moving stuff out of the cpu domain
> made me think there's a bug here (false vs. true). But actually this
> code moves it into the cpu domain so it's actually fine, I wonder if
> there's a better place for the comment (eg. where we do set
> cache_clean=true)?

I thought it made more sense here because this is where we playing the
trick to avoid the clflush.

Hmm, would s/If an object/When the object/ and
s/cache_clean/cache_flushed/ suffice?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-12-01 13:09                   ` Chris Wilson
@ 2015-12-01 13:28                     ` Ville Syrjälä
  2015-12-01 13:49                       ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: Ville Syrjälä @ 2015-12-01 13:28 UTC (permalink / raw)
  To: Chris Wilson, akash.goel, intel-gfx

On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
> On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> > On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> > >  
> > >  	/* Flush the CPU cache if it's still invalid. */
> > >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > > -		i915_gem_clflush_object(obj, false);
> > > +		/* If an object is moved out of the CPU domain following a
> > > +		 * CPU write and before a GPU or GTT write, we will clflush
> > > +		 * it out of the CPU cache, and mark the cache as clean.
> > > +		 * After clflushing we know that this object cannot be in the
> > > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > > +		 * cache as our objects are page-aligned (& speculation cannot
> > > +		 * cross page boundaries). Whilst this flag is set, we know
> > > +		 * that any future access to the object's pages will miss the
> > > +		 * stale cache and have to be serviced from main memory, i.e.
> > > +		 * we do not need another clflush to invalidate the CPU cache
> > > +		 * in preparing to read from the object.
> > > +		 */
> > > +		if (!obj->cache_clean)
> > > +			i915_gem_clflush_object(obj, false);
> > > +		obj->cache_clean = false;
> > 
> > Having the comment here talk about moving stuff out of the cpu domain
> > made me think there's a bug here (false vs. true). But actually this
> > code moves it into the cpu domain so it's actually fine, I wonder if
> > there's a better place for the comment (eg. where we do set
> > cache_clean=true)?
> 
> I thought it made more sense here because this is where we playing the
> trick to avoid the clflush.
> 
> Hmm, would s/If an object/When the object/ and
> s/cache_clean/cache_flushed/ suffice?

Maybe 'When the object is eventually moved out...' ?

That extra word might convey more clearly that's it's not talking
about moving it out right now.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-12-01 13:28                     ` Ville Syrjälä
@ 2015-12-01 13:49                       ` Chris Wilson
  2015-12-01 14:00                         ` Ville Syrjälä
  0 siblings, 1 reply; 25+ messages in thread
From: Chris Wilson @ 2015-12-01 13:49 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: akash.goel, intel-gfx

On Tue, Dec 01, 2015 at 03:28:28PM +0200, Ville Syrjälä wrote:
> On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
> > On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> > > On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > > > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> > > >  
> > > >  	/* Flush the CPU cache if it's still invalid. */
> > > >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > > > -		i915_gem_clflush_object(obj, false);
> > > > +		/* If an object is moved out of the CPU domain following a
> > > > +		 * CPU write and before a GPU or GTT write, we will clflush
> > > > +		 * it out of the CPU cache, and mark the cache as clean.
> > > > +		 * After clflushing we know that this object cannot be in the
> > > > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > > > +		 * cache as our objects are page-aligned (& speculation cannot
> > > > +		 * cross page boundaries). Whilst this flag is set, we know
> > > > +		 * that any future access to the object's pages will miss the
> > > > +		 * stale cache and have to be serviced from main memory, i.e.
> > > > +		 * we do not need another clflush to invalidate the CPU cache
> > > > +		 * in preparing to read from the object.
> > > > +		 */
> > > > +		if (!obj->cache_clean)
> > > > +			i915_gem_clflush_object(obj, false);
> > > > +		obj->cache_clean = false;
> > > 
> > > Having the comment here talk about moving stuff out of the cpu domain
> > > made me think there's a bug here (false vs. true). But actually this
> > > code moves it into the cpu domain so it's actually fine, I wonder if
> > > there's a better place for the comment (eg. where we do set
> > > cache_clean=true)?
> > 
> > I thought it made more sense here because this is where we playing the
> > trick to avoid the clflush.
> > 
> > Hmm, would s/If an object/When the object/ and
> > s/cache_clean/cache_flushed/ suffice?
> 
> Maybe 'When the object is eventually moved out...' ?
> 
> That extra word might convey more clearly that's it's not talking
> about moving it out right now.

Hmm, the change of tense is good. Let's try that again:

When the object was moved out the CPU domain following a CPU write, we
will have flushed it out of the CPU cache (and marked the object as
cache_flushed). After the clflush, we know that this object cannot be in
the CPU cache, nor can it be speculatively loaded into the CPU cache as
our objects are page-aligned and speculation cannot cross page
boundaries. So whilst the cache_flushed flag is set, we know that any
future access to the object's pages will miss the GPU cache and have to
be serviced from main memory (where they will pick up any writes
through the GTT or by the GPU) i.e. we do not need another clflush here
and now to invalidate the CPU cache as we prepare to read from the object.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-12-01 13:49                       ` Chris Wilson
@ 2015-12-01 14:00                         ` Ville Syrjälä
  2015-12-01 15:00                           ` Goel, Akash
  0 siblings, 1 reply; 25+ messages in thread
From: Ville Syrjälä @ 2015-12-01 14:00 UTC (permalink / raw)
  To: Chris Wilson, akash.goel, intel-gfx

On Tue, Dec 01, 2015 at 01:49:10PM +0000, Chris Wilson wrote:
> On Tue, Dec 01, 2015 at 03:28:28PM +0200, Ville Syrjälä wrote:
> > On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
> > > On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
> > > > On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
> > > > > @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
> > > > >  
> > > > >  	/* Flush the CPU cache if it's still invalid. */
> > > > >  	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
> > > > > -		i915_gem_clflush_object(obj, false);
> > > > > +		/* If an object is moved out of the CPU domain following a
> > > > > +		 * CPU write and before a GPU or GTT write, we will clflush
> > > > > +		 * it out of the CPU cache, and mark the cache as clean.
> > > > > +		 * After clflushing we know that this object cannot be in the
> > > > > +		 * CPU cache, nor can it be speculatively loaded into the CPU
> > > > > +		 * cache as our objects are page-aligned (& speculation cannot
> > > > > +		 * cross page boundaries). Whilst this flag is set, we know
> > > > > +		 * that any future access to the object's pages will miss the
> > > > > +		 * stale cache and have to be serviced from main memory, i.e.
> > > > > +		 * we do not need another clflush to invalidate the CPU cache
> > > > > +		 * in preparing to read from the object.
> > > > > +		 */
> > > > > +		if (!obj->cache_clean)
> > > > > +			i915_gem_clflush_object(obj, false);
> > > > > +		obj->cache_clean = false;
> > > > 
> > > > Having the comment here talk about moving stuff out of the cpu domain
> > > > made me think there's a bug here (false vs. true). But actually this
> > > > code moves it into the cpu domain so it's actually fine, I wonder if
> > > > there's a better place for the comment (eg. where we do set
> > > > cache_clean=true)?
> > > 
> > > I thought it made more sense here because this is where we playing the
> > > trick to avoid the clflush.
> > > 
> > > Hmm, would s/If an object/When the object/ and
> > > s/cache_clean/cache_flushed/ suffice?
> > 
> > Maybe 'When the object is eventually moved out...' ?
> > 
> > That extra word might convey more clearly that's it's not talking
> > about moving it out right now.
> 
> Hmm, the change of tense is good. Let's try that again:
> 
> When the object was moved out the CPU domain following a CPU write, we
> will have flushed it out of the CPU cache (and marked the object as
> cache_flushed). After the clflush, we know that this object cannot be in
> the CPU cache, nor can it be speculatively loaded into the CPU cache as
> our objects are page-aligned and speculation cannot cross page
> boundaries. So whilst the cache_flushed flag is set, we know that any
> future access to the object's pages will miss the GPU cache and have to
> be serviced from main memory (where they will pick up any writes
> through the GTT or by the GPU) i.e. we do not need another clflush here
> and now to invalidate the CPU cache as we prepare to read from the object.

Hmm, yeah referring to past events clearly now. That does make more
sense that referring to future events. lgtm

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH v3] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-12-01 14:00                         ` Ville Syrjälä
@ 2015-12-01 15:00                           ` Goel, Akash
  2015-12-02  8:07                             ` [PATCH v4] " akash.goel
  0 siblings, 1 reply; 25+ messages in thread
From: Goel, Akash @ 2015-12-01 15:00 UTC (permalink / raw)
  To: Ville Syrjälä, Chris Wilson; +Cc: intel-gfx, akash.goel



On 12/1/2015 7:30 PM, Ville Syrjälä wrote:
> On Tue, Dec 01, 2015 at 01:49:10PM +0000, Chris Wilson wrote:
>> On Tue, Dec 01, 2015 at 03:28:28PM +0200, Ville Syrjälä wrote:
>>> On Tue, Dec 01, 2015 at 01:09:33PM +0000, Chris Wilson wrote:
>>>> On Tue, Dec 01, 2015 at 02:34:41PM +0200, Ville Syrjälä wrote:
>>>>> On Mon, Nov 30, 2015 at 12:41:05PM +0530, akash.goel@intel.com wrote:
>>>>>> @@ -3982,7 +3983,21 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>>>>>>
>>>>>>   	/* Flush the CPU cache if it's still invalid. */
>>>>>>   	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
>>>>>> -		i915_gem_clflush_object(obj, false);
>>>>>> +		/* If an object is moved out of the CPU domain following a
>>>>>> +		 * CPU write and before a GPU or GTT write, we will clflush
>>>>>> +		 * it out of the CPU cache, and mark the cache as clean.
>>>>>> +		 * After clflushing we know that this object cannot be in the
>>>>>> +		 * CPU cache, nor can it be speculatively loaded into the CPU
>>>>>> +		 * cache as our objects are page-aligned (& speculation cannot
>>>>>> +		 * cross page boundaries). Whilst this flag is set, we know
>>>>>> +		 * that any future access to the object's pages will miss the
>>>>>> +		 * stale cache and have to be serviced from main memory, i.e.
>>>>>> +		 * we do not need another clflush to invalidate the CPU cache
>>>>>> +		 * in preparing to read from the object.
>>>>>> +		 */
>>>>>> +		if (!obj->cache_clean)
>>>>>> +			i915_gem_clflush_object(obj, false);
>>>>>> +		obj->cache_clean = false;
>>>>>
>>>>> Having the comment here talk about moving stuff out of the cpu domain
>>>>> made me think there's a bug here (false vs. true). But actually this
>>>>> code moves it into the cpu domain so it's actually fine, I wonder if
>>>>> there's a better place for the comment (eg. where we do set
>>>>> cache_clean=true)?
>>>>
>>>> I thought it made more sense here because this is where we playing the
>>>> trick to avoid the clflush.
>>>>
>>>> Hmm, would s/If an object/When the object/ and
>>>> s/cache_clean/cache_flushed/ suffice?
>>>
>>> Maybe 'When the object is eventually moved out...' ?
>>>
>>> That extra word might convey more clearly that's it's not talking
>>> about moving it out right now.
>>
>> Hmm, the change of tense is good. Let's try that again:
>>
>> When the object was moved out the CPU domain following a CPU write, we
>> will have flushed it out of the CPU cache (and marked the object as
>> cache_flushed). After the clflush, we know that this object cannot be in
>> the CPU cache, nor can it be speculatively loaded into the CPU cache as
>> our objects are page-aligned and speculation cannot cross page
>> boundaries. So whilst the cache_flushed flag is set, we know that any
>> future access to the object's pages will miss the GPU cache and have to
>> be serviced from main memory (where they will pick up any writes
>> through the GTT or by the GPU) i.e. we do not need another clflush here
>> and now to invalidate the CPU cache as we prepare to read from the object.
>
> Hmm, yeah referring to past events clearly now. That does make more
> sense that referring to future events. lgtm

Thanks, will update the 'comments' in the next version.

And will also rename 'cache_clean' flag to 'cache_flushed'.

Best Regards
Akash


>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH v4] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-12-01 15:00                           ` Goel, Akash
@ 2015-12-02  8:07                             ` akash.goel
  2015-12-06 17:03                               ` Chris Wilson
  0 siblings, 1 reply; 25+ messages in thread
From: akash.goel @ 2015-12-02  8:07 UTC (permalink / raw)
  To: intel-gfx; +Cc: =ville.syrjala, Akash Goel

From: Akash Goel <akash.goel@intel.com>

When the object is moved out of CPU read domain, the cachelines
are not invalidated immediately. The invalidation is deferred till
next time the object is brought back into CPU read domain.
But the invalidation is done unconditionally, i.e. even for the case
where the cachelines were flushed previously, when the object moved out
of CPU write domain. This is avoidable and would lead to some optimization.
Though this is not a hypothetical case, but is unlikely to occur often.
The aim is to detect changes to the backing storage whilst the
data is potentially in the CPU cache, and only clflush in those case.

v2: Made the comment more verbose (Ville/Chris)
    Added doc for 'cache_clean' field (Daniel)

v3: Updated the comment to assuage an apprehension regarding the
    speculative-prefetching behavior of HW (Ville/Chris)

v4: Renamed 'cache_clean' to 'cache_flushed' as its more appropriate (Ville)
    Made minor update in the comments for more clarity (Chris)

Testcase: igt/gem_concurrent_blit
Testcase: igt/benchmarks/gem_set_domain
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Akash Goel <akash.goel@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h |  9 +++++++++
 drivers/gpu/drm/i915/i915_gem.c | 19 ++++++++++++++++++-
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 11ae5a5..e6e4bb0 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2100,6 +2100,15 @@ struct drm_i915_gem_object {
 	unsigned int cache_level:3;
 	unsigned int cache_dirty:1;
 
+	/*
+	 * Tracks if the CPU cache has been completely flushed, on which
+	 * there should be no data in CPU cachelines for the object.
+	 * cache_flushed would also imply !cache_dirty (no data in
+	 * cachelines, so not dirty also).
+	 * cache_dirty just tracks whether we have been omitting clflushes.
+	 */
+	unsigned int cache_flushed:1;
+
 	unsigned int frontbuffer_bits:INTEL_FRONTBUFFER_BITS;
 
 	unsigned int pin_display;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 33adc8f..cdc50d8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3552,6 +3552,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj,
 	trace_i915_gem_object_clflush(obj);
 	drm_clflush_sg(obj->pages);
 	obj->cache_dirty = false;
+	obj->cache_flushed = true;
 
 	return true;
 }
@@ -3982,7 +3983,23 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 
 	/* Flush the CPU cache if it's still invalid. */
 	if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
-		i915_gem_clflush_object(obj, false);
+		/* When the object was moved out of the CPU domain following a
+		 * CPU write, we will have flushed it out of the CPU cache (and
+		 * marked the object as cache_flushed).
+		 * After the clflush we know that this object cannot be in the
+		 * CPU cache, nor can it be speculatively loaded into the CPU
+		 * cache as our objects are page-aligned and speculation cannot
+		 * cross page boundaries. So whilst the cache_flushed flag is
+		 * set, we know that any future access to the object's pages
+		 * will miss the CPU cache and have to be serviced from main
+		 * memory (where they will pick up any writes through the GTT or
+		 * by the GPU) i.e. we do not need another clflush here and now
+		 * to invalidate the CPU cache as we prepare to read from the
+		 * object.
+		 */
+		if (!obj->cache_flushed)
+			i915_gem_clflush_object(obj, false);
+		obj->cache_flushed = false;
 
 		obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
 	}
-- 
1.9.2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH v4] drm/i915 : Avoid superfluous invalidation of CPU cache lines
  2015-12-02  8:07                             ` [PATCH v4] " akash.goel
@ 2015-12-06 17:03                               ` Chris Wilson
  0 siblings, 0 replies; 25+ messages in thread
From: Chris Wilson @ 2015-12-06 17:03 UTC (permalink / raw)
  To: akash.goel; +Cc: intel-gfx, =ville.syrjala

On Wed, Dec 02, 2015 at 01:37:44PM +0530, akash.goel@intel.com wrote:
> From: Akash Goel <akash.goel@intel.com>
> 
> When the object is moved out of CPU read domain, the cachelines
> are not invalidated immediately. The invalidation is deferred till
> next time the object is brought back into CPU read domain.
> But the invalidation is done unconditionally, i.e. even for the case
> where the cachelines were flushed previously, when the object moved out
> of CPU write domain. This is avoidable and would lead to some optimization.
> Though this is not a hypothetical case, but is unlikely to occur often.
> The aim is to detect changes to the backing storage whilst the
> data is potentially in the CPU cache, and only clflush in those case.
> 
> v2: Made the comment more verbose (Ville/Chris)
>     Added doc for 'cache_clean' field (Daniel)
> 
> v3: Updated the comment to assuage an apprehension regarding the
>     speculative-prefetching behavior of HW (Ville/Chris)
> 
> v4: Renamed 'cache_clean' to 'cache_flushed' as its more appropriate (Ville)
>     Made minor update in the comments for more clarity (Chris)

Ah, spotted one nuisance we have in i915_gem_obj_prepare_shmem_read()
that needs to clear the cache_flushed flag as after that call we will
pollute the CPU cache (whilst pretending the object is not in the CPU
cache domain).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-12-06 17:03 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-24 10:05 [PATCH] drm/i915 : Avoid superfluous invalidation of CPU cache lines akash.goel
2015-11-24 10:04 ` Ville Syrjälä
2015-11-24 18:14   ` Daniel Vetter
2015-11-24 22:39     ` Chris Wilson
2015-11-25  5:29       ` [PATCH v2] " akash.goel
2015-11-25  9:21       ` [PATCH] " Daniel Vetter
2015-11-25  9:27         ` Goel, Akash
2015-11-25 10:00           ` Daniel Vetter
2015-11-30  6:24             ` Goel, Akash
2015-11-30  8:15               ` Daniel Vetter
2015-12-01 12:07                 ` Goel, Akash
2015-11-25 11:02       ` Ville Syrjälä
2015-11-25 17:28         ` Chris Wilson
2015-11-26  3:39           ` Goel, Akash
2015-11-26 10:57             ` Chris Wilson
2015-11-30  7:11               ` [PATCH v3] " akash.goel
2015-12-01 12:34                 ` Ville Syrjälä
2015-12-01 13:09                   ` Chris Wilson
2015-12-01 13:28                     ` Ville Syrjälä
2015-12-01 13:49                       ` Chris Wilson
2015-12-01 14:00                         ` Ville Syrjälä
2015-12-01 15:00                           ` Goel, Akash
2015-12-02  8:07                             ` [PATCH v4] " akash.goel
2015-12-06 17:03                               ` Chris Wilson
2015-11-24 10:10 ` [PATCH] " Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.