linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] drm/i915: use udelay for very short delays
@ 2016-12-15  4:29 Nicholas Mc Guire
  2016-12-15  9:08 ` Jani Nikula
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Mc Guire @ 2016-12-15  4:29 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Shashank Sharma, Jani Nikula, David Airlie, intel-gfx, dri-devel,
	linux-kernel, Nicholas Mc Guire

Even on fast systems a 2 microsecond delay is most likely more efficient
as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
change this to a udelay(2).

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
---

Problem found by coccinelle:

Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)

Patch is against 4.9.0 (localversion-next is next-20161214)

 drivers/gpu/drm/i915/intel_dsi.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
index 5b72c50..19fe86b 100644
--- a/drivers/gpu/drm/i915/intel_dsi.c
+++ b/drivers/gpu/drm/i915/intel_dsi.c
@@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
 		val &= ~ULPS_STATE_MASK;
 		val |= (ULPS_STATE_ENTER | DEVICE_READY);
 		I915_WRITE(MIPI_DEVICE_READY(port), val);
-		usleep_range(2, 3);
+		udelay(2);
 
 		/* 3. Exit ULPS */
 		val = I915_READ(MIPI_DEVICE_READY(port));
-- 
2.1.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15  4:29 [PATCH] drm/i915: use udelay for very short delays Nicholas Mc Guire
@ 2016-12-15  9:08 ` Jani Nikula
  2016-12-15  9:25   ` [Intel-gfx] " Daniel Vetter
  2016-12-15  9:28   ` Nicholas Mc Guire
  0 siblings, 2 replies; 13+ messages in thread
From: Jani Nikula @ 2016-12-15  9:08 UTC (permalink / raw)
  To: Nicholas Mc Guire, Daniel Vetter
  Cc: Shashank Sharma, David Airlie, intel-gfx, dri-devel,
	linux-kernel, Nicholas Mc Guire

On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> Even on fast systems a 2 microsecond delay is most likely more efficient
> as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> change this to a udelay(2).

Similar concerns as in [1]. We don't need the accuracy of udelay() here,
so this boils down to which is the better use of CPU. We could probably
relax the max delay more if that was helpful. But I'm not immediately
sold on "is most likely more efficient" which sounds like a gut feeling.

I'm sorry it's not clear in my other reply that I do appreciate
addressing incorrect/silly use of usleep_range(); I'm just not (yet)
convinced udelay() is the answer.

BR,
Jani.


[1] http://lkml.kernel.org/r/8737hpr32a.fsf@intel.com


>
> Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
> ---
>
> Problem found by coccinelle:
>
> Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)
>
> Patch is against 4.9.0 (localversion-next is next-20161214)
>
>  drivers/gpu/drm/i915/intel_dsi.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
> index 5b72c50..19fe86b 100644
> --- a/drivers/gpu/drm/i915/intel_dsi.c
> +++ b/drivers/gpu/drm/i915/intel_dsi.c
> @@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
>  		val &= ~ULPS_STATE_MASK;
>  		val |= (ULPS_STATE_ENTER | DEVICE_READY);
>  		I915_WRITE(MIPI_DEVICE_READY(port), val);
> -		usleep_range(2, 3);
> +		udelay(2);
>  
>  		/* 3. Exit ULPS */
>  		val = I915_READ(MIPI_DEVICE_READY(port));

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays
  2016-12-15  9:08 ` Jani Nikula
@ 2016-12-15  9:25   ` Daniel Vetter
  2016-12-15  9:27     ` Daniel Vetter
  2016-12-15  9:28   ` Nicholas Mc Guire
  1 sibling, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2016-12-15  9:25 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Nicholas Mc Guire, Daniel Vetter, David Airlie, intel-gfx,
	linux-kernel, dri-devel

On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > change this to a udelay(2).
> 
> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> so this boils down to which is the better use of CPU. We could probably
> relax the max delay more if that was helpful. But I'm not immediately
> sold on "is most likely more efficient" which sounds like a gut feeling.
> 
> I'm sorry it's not clear in my other reply that I do appreciate
> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> convinced udelay() is the answer.

So one reason why we unconditionally use *sleep variants is the
might_sleep check. Because in the past people have used udelay and mdelay,
those delays had to be increased a lot because hw, and at the same time
someone added users of these functions to our irq helper, resulting in irq
handling times measures in multiple ms. That's not good.

So until someone can demonstrate that there's a real benefit (which let's
be honest, for modeset code, will never be the case) I very highly prefer
to use *sleep* functions. They prevent some silly things from happening by
accident. Micro-optimizing modeset code and hampering maitainability in
the process is not good.
-Daniel

> 
> BR,
> Jani.
> 
> 
> [1] http://lkml.kernel.org/r/8737hpr32a.fsf@intel.com
> 
> 
> >
> > Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
> > ---
> >
> > Problem found by coccinelle:
> >
> > Patch was compile tested with: x86_64_defconfig (implies CONFIG_DRM_I915)
> >
> > Patch is against 4.9.0 (localversion-next is next-20161214)
> >
> >  drivers/gpu/drm/i915/intel_dsi.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/i915/intel_dsi.c b/drivers/gpu/drm/i915/intel_dsi.c
> > index 5b72c50..19fe86b 100644
> > --- a/drivers/gpu/drm/i915/intel_dsi.c
> > +++ b/drivers/gpu/drm/i915/intel_dsi.c
> > @@ -379,7 +379,7 @@ static void bxt_dsi_device_ready(struct intel_encoder *encoder)
> >  		val &= ~ULPS_STATE_MASK;
> >  		val |= (ULPS_STATE_ENTER | DEVICE_READY);
> >  		I915_WRITE(MIPI_DEVICE_READY(port), val);
> > -		usleep_range(2, 3);
> > +		udelay(2);
> >  
> >  		/* 3. Exit ULPS */
> >  		val = I915_READ(MIPI_DEVICE_READY(port));
> 
> -- 
> Jani Nikula, Intel Open Source Technology Center
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays
  2016-12-15  9:25   ` [Intel-gfx] " Daniel Vetter
@ 2016-12-15  9:27     ` Daniel Vetter
  2016-12-15 10:51       ` Nicholas Mc Guire
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Vetter @ 2016-12-15  9:27 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Nicholas Mc Guire, Daniel Vetter, David Airlie, intel-gfx,
	linux-kernel, dri-devel

On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > change this to a udelay(2).
> > 
> > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > so this boils down to which is the better use of CPU. We could probably
> > relax the max delay more if that was helpful. But I'm not immediately
> > sold on "is most likely more efficient" which sounds like a gut feeling.
> > 
> > I'm sorry it's not clear in my other reply that I do appreciate
> > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > convinced udelay() is the answer.
> 
> So one reason why we unconditionally use *sleep variants is the
> might_sleep check. Because in the past people have used udelay and mdelay,
> those delays had to be increased a lot because hw, and at the same time
> someone added users of these functions to our irq helper, resulting in irq
> handling times measures in multiple ms. That's not good.
> 
> So until someone can demonstrate that there's a real benefit (which let's
> be honest, for modeset code, will never be the case) I very highly prefer
> to use *sleep* functions. They prevent some silly things from happening by
> accident. Micro-optimizing modeset code and hampering maitainability in
> the process is not good.

Also, the entire premise seems backwards: usleep_range is inefficient for
certain parameter ranges and better replaced with udelay. That makes
sense.

But why exactly do we not fix udelay_range then, but instead do a cocci
job crawling through all the thousands of callers? Just fix usleep(_range)
to use udelay for very small values (and still keep the might_sleep check
ofc) if that's more efficient!
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15  9:08 ` Jani Nikula
  2016-12-15  9:25   ` [Intel-gfx] " Daniel Vetter
@ 2016-12-15  9:28   ` Nicholas Mc Guire
  2016-12-15  9:52     ` Jani Nikula
  1 sibling, 1 reply; 13+ messages in thread
From: Nicholas Mc Guire @ 2016-12-15  9:28 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Nicholas Mc Guire, Daniel Vetter, Shashank Sharma, David Airlie,
	intel-gfx, dri-devel, linux-kernel

On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > change this to a udelay(2).
> 
> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> so this boils down to which is the better use of CPU. We could probably
> relax the max delay more if that was helpful. But I'm not immediately
> sold on "is most likely more efficient" which sounds like a gut feeling.
> 
> I'm sorry it's not clear in my other reply that I do appreciate
> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> convinced udelay() is the answer.

if the delay is not critical and all that is needed 
is an assurance that it is greater than X us then 
usleep_range is fine with a relaxed limit. 
So from what you wrote my patch proposal is wrong - the
udelay() is not the way to got.
My intent is to get rid of very small usleep_range() cases
so if usleep_range(20,50) causes no issues with this driver
and does not induce any performance penalty then that would 
be the way to go I think.

thx!
hofrat

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15  9:28   ` Nicholas Mc Guire
@ 2016-12-15  9:52     ` Jani Nikula
  2016-12-15 10:10       ` Ville Syrjälä
  0 siblings, 1 reply; 13+ messages in thread
From: Jani Nikula @ 2016-12-15  9:52 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Nicholas Mc Guire, Daniel Vetter, Shashank Sharma, David Airlie,
	intel-gfx, dri-devel, linux-kernel

On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> > change this to a udelay(2).
>> 
>> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> so this boils down to which is the better use of CPU. We could probably
>> relax the max delay more if that was helpful. But I'm not immediately
>> sold on "is most likely more efficient" which sounds like a gut feeling.
>> 
>> I'm sorry it's not clear in my other reply that I do appreciate
>> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> convinced udelay() is the answer.
>
> if the delay is not critical and all that is needed 
> is an assurance that it is greater than X us then 
> usleep_range is fine with a relaxed limit. 
> So from what you wrote my patch proposal is wrong - the
> udelay() is not the way to got.
> My intent is to get rid of very small usleep_range() cases
> so if usleep_range(20,50) causes no issues with this driver
> and does not induce any performance penalty then that would 
> be the way to go I think.

Okay, so I looked at the code, and I looked at our spec, and I looked at
the MIPI D-PHY spec, and I cried a little.

Long story short, I think usleep_range(10, 50) will be fine.


BR,
Jani.


-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15  9:52     ` Jani Nikula
@ 2016-12-15 10:10       ` Ville Syrjälä
  2016-12-15 10:20         ` Jani Nikula
  0 siblings, 1 reply; 13+ messages in thread
From: Ville Syrjälä @ 2016-12-15 10:10 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Nicholas Mc Guire, intel-gfx, linux-kernel, dri-devel,
	Nicholas Mc Guire, Daniel Vetter

On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> >> > change this to a udelay(2).
> >> 
> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> >> so this boils down to which is the better use of CPU. We could probably
> >> relax the max delay more if that was helpful. But I'm not immediately
> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> >> 
> >> I'm sorry it's not clear in my other reply that I do appreciate
> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> >> convinced udelay() is the answer.
> >
> > if the delay is not critical and all that is needed 
> > is an assurance that it is greater than X us then 
> > usleep_range is fine with a relaxed limit. 
> > So from what you wrote my patch proposal is wrong - the
> > udelay() is not the way to got.
> > My intent is to get rid of very small usleep_range() cases
> > so if usleep_range(20,50) causes no issues with this driver
> > and does not induce any performance penalty then that would 
> > be the way to go I think.
> 
> Okay, so I looked at the code, and I looked at our spec, and I looked at
> the MIPI D-PHY spec, and I cried a little.
> 
> Long story short, I think usleep_range(10, 50) will be fine.

Note that I really want to see a comment next to every delay like this
documenting the actual hardware requirement, if the delay used by the
code doesn't 100% match it.

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15 10:10       ` Ville Syrjälä
@ 2016-12-15 10:20         ` Jani Nikula
  2016-12-15 10:34           ` Nicholas Mc Guire
  0 siblings, 1 reply; 13+ messages in thread
From: Jani Nikula @ 2016-12-15 10:20 UTC (permalink / raw)
  To: Ville Syrjälä
  Cc: Nicholas Mc Guire, intel-gfx, linux-kernel, dri-devel,
	Nicholas Mc Guire, Daniel Vetter

On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
>> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> >> > change this to a udelay(2).
>> >> 
>> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> >> so this boils down to which is the better use of CPU. We could probably
>> >> relax the max delay more if that was helpful. But I'm not immediately
>> >> sold on "is most likely more efficient" which sounds like a gut feeling.
>> >> 
>> >> I'm sorry it's not clear in my other reply that I do appreciate
>> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> >> convinced udelay() is the answer.
>> >
>> > if the delay is not critical and all that is needed 
>> > is an assurance that it is greater than X us then 
>> > usleep_range is fine with a relaxed limit. 
>> > So from what you wrote my patch proposal is wrong - the
>> > udelay() is not the way to got.
>> > My intent is to get rid of very small usleep_range() cases
>> > so if usleep_range(20,50) causes no issues with this driver
>> > and does not induce any performance penalty then that would 
>> > be the way to go I think.
>> 
>> Okay, so I looked at the code, and I looked at our spec, and I looked at
>> the MIPI D-PHY spec, and I cried a little.
>> 
>> Long story short, I think usleep_range(10, 50) will be fine.
>
> Note that I really want to see a comment next to every delay like this
> documenting the actual hardware requirement, if the delay used by the
> code doesn't 100% match it.

Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
view wrt D-PHY, and our code doesn't even match the spec. Hence the
tears. Want to propose a wording for the comment so we can apply this
change, without going for a full rewrite of the sequence?


BR,
Jani.

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15 10:20         ` Jani Nikula
@ 2016-12-15 10:34           ` Nicholas Mc Guire
  2016-12-15 11:48             ` Jani Nikula
  2016-12-15 11:51             ` Ville Syrjälä
  0 siblings, 2 replies; 13+ messages in thread
From: Nicholas Mc Guire @ 2016-12-15 10:34 UTC (permalink / raw)
  To: Jani Nikula
  Cc: Ville Syrjälä,
	intel-gfx, linux-kernel, dri-devel, Nicholas Mc Guire,
	Daniel Vetter

On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
> On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> >> >> > change this to a udelay(2).
> >> >> 
> >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> >> >> so this boils down to which is the better use of CPU. We could probably
> >> >> relax the max delay more if that was helpful. But I'm not immediately
> >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> >> >> 
> >> >> I'm sorry it's not clear in my other reply that I do appreciate
> >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> >> >> convinced udelay() is the answer.
> >> >
> >> > if the delay is not critical and all that is needed 
> >> > is an assurance that it is greater than X us then 
> >> > usleep_range is fine with a relaxed limit. 
> >> > So from what you wrote my patch proposal is wrong - the
> >> > udelay() is not the way to got.
> >> > My intent is to get rid of very small usleep_range() cases
> >> > so if usleep_range(20,50) causes no issues with this driver
> >> > and does not induce any performance penalty then that would 
> >> > be the way to go I think.
> >> 
> >> Okay, so I looked at the code, and I looked at our spec, and I looked at
> >> the MIPI D-PHY spec, and I cried a little.
> >> 
> >> Long story short, I think usleep_range(10, 50) will be fine.
> >
> > Note that I really want to see a comment next to every delay like this
> > documenting the actual hardware requirement, if the delay used by the
> > code doesn't 100% match it.
> 
> Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
> view wrt D-PHY, and our code doesn't even match the spec. Hence the
> tears. Want to propose a wording for the comment so we can apply this
> change, without going for a full rewrite of the sequence?
>
is that suitable or am I overdoing it ?

-               usleep_range(2, 3);
+               /* delay for at least 2us - relaxed to 10-50 to allow
+                * hrtimer subsystem to optimize uncritical timer handling
+                */
+               usleep_range(10, 50);

thx!
hofrat 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays
  2016-12-15  9:27     ` Daniel Vetter
@ 2016-12-15 10:51       ` Nicholas Mc Guire
  2016-12-15 11:39         ` Daniel Vetter
  0 siblings, 1 reply; 13+ messages in thread
From: Nicholas Mc Guire @ 2016-12-15 10:51 UTC (permalink / raw)
  To: Daniel Vetter
  Cc: Jani Nikula, Nicholas Mc Guire, Daniel Vetter, David Airlie,
	intel-gfx, linux-kernel, dri-devel

On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > > > change this to a udelay(2).
> > > 
> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > > so this boils down to which is the better use of CPU. We could probably
> > > relax the max delay more if that was helpful. But I'm not immediately
> > > sold on "is most likely more efficient" which sounds like a gut feeling.
> > > 
> > > I'm sorry it's not clear in my other reply that I do appreciate
> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > > convinced udelay() is the answer.
> > 
> > So one reason why we unconditionally use *sleep variants is the
> > might_sleep check. Because in the past people have used udelay and mdelay,
> > those delays had to be increased a lot because hw, and at the same time
> > someone added users of these functions to our irq helper, resulting in irq
> > handling times measures in multiple ms. That's not good.
> > 
> > So until someone can demonstrate that there's a real benefit (which let's
> > be honest, for modeset code, will never be the case) I very highly prefer
> > to use *sleep* functions. They prevent some silly things from happening by
> > accident. Micro-optimizing modeset code and hampering maitainability in
> > the process is not good.
> 
> Also, the entire premise seems backwards: usleep_range is inefficient for
> certain parameter ranges and better replaced with udelay. That makes
> sense.
> 
> But why exactly do we not fix udelay_range then, but instead do a cocci
> job crawling through all the thousands of callers? Just fix usleep(_range)
> to use udelay for very small values (and still keep the might_sleep check
> ofc) if that's more efficient!

its actually not thousands more like a few dozen:

usleep_range(min,max) in linux-stable 4.9.0

1648 calls total
1488 pass numeric values only (90.29%)
  27 min below 10us (1.81%)
  40 min above 10ms (2.68%)
     min out of spec 4.50%
  76 preprocessor constants (4.61%)
   1 min below 10us (1.31%)
   8 min above 10ms (10.52%)
     min out of spec 11.84%
  85 expressions (5.15%)
1(0) min below 10us (1.50%)*
6(2) min above 10ms (7.50%)*
     min out of spec 9.0%
Errors:
  23 where min==max  (1.39%)
   0 where max < min (0.00%)

Total:
  Bugs: 6.48%-10.70%*
  Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
  Detectable by coccinelle:
  Bugs: 74/103 (71.8%)
  Crit: 50/52 (96.1%)

*based on estimats as runtime values not known.


there is no usleep() as noted in Documentation/timers/timers-howto.txt
                - Why not usleep?
                        On slower systems, (embedded, OR perhaps a speed-
                        stepped PC!) the overhead of setting up the hrtimers
                        for usleep *may* not be worth it. Such an evaluation
                        will obviously depend on your specific situation, but
                        it is something to be aware of.

and it suggests to use different API for different ranges - sounds sane
to me and seems to cover the needs of drivers.

thx!
hofrat

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [Intel-gfx] [PATCH] drm/i915: use udelay for very short delays
  2016-12-15 10:51       ` Nicholas Mc Guire
@ 2016-12-15 11:39         ` Daniel Vetter
  0 siblings, 0 replies; 13+ messages in thread
From: Daniel Vetter @ 2016-12-15 11:39 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Jani Nikula, Nicholas Mc Guire, Daniel Vetter, David Airlie,
	intel-gfx, Linux Kernel Mailing List, dri-devel

On Thu, Dec 15, 2016 at 11:51 AM, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> On Thu, Dec 15, 2016 at 10:27:59AM +0100, Daniel Vetter wrote:
>> On Thu, Dec 15, 2016 at 10:25:19AM +0100, Daniel Vetter wrote:
>> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> > > On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> > > > Even on fast systems a 2 microsecond delay is most likely more efficient
>> > > > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> > > > change this to a udelay(2).
>> > >
>> > > Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> > > so this boils down to which is the better use of CPU. We could probably
>> > > relax the max delay more if that was helpful. But I'm not immediately
>> > > sold on "is most likely more efficient" which sounds like a gut feeling.
>> > >
>> > > I'm sorry it's not clear in my other reply that I do appreciate
>> > > addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> > > convinced udelay() is the answer.
>> >
>> > So one reason why we unconditionally use *sleep variants is the
>> > might_sleep check. Because in the past people have used udelay and mdelay,
>> > those delays had to be increased a lot because hw, and at the same time
>> > someone added users of these functions to our irq helper, resulting in irq
>> > handling times measures in multiple ms. That's not good.
>> >
>> > So until someone can demonstrate that there's a real benefit (which let's
>> > be honest, for modeset code, will never be the case) I very highly prefer
>> > to use *sleep* functions. They prevent some silly things from happening by
>> > accident. Micro-optimizing modeset code and hampering maitainability in
>> > the process is not good.
>>
>> Also, the entire premise seems backwards: usleep_range is inefficient for
>> certain parameter ranges and better replaced with udelay. That makes
>> sense.
>>
>> But why exactly do we not fix udelay_range then, but instead do a cocci
>> job crawling through all the thousands of callers? Just fix usleep(_range)
>> to use udelay for very small values (and still keep the might_sleep check
>> ofc) if that's more efficient!
>
> its actually not thousands more like a few dozen:

There's 1k + usleep* calls you need to audit and teach people how to
exactly use it.

> usleep_range(min,max) in linux-stable 4.9.0
>
> 1648 calls total
> 1488 pass numeric values only (90.29%)
>   27 min below 10us (1.81%)
>   40 min above 10ms (2.68%)
>      min out of spec 4.50%
>   76 preprocessor constants (4.61%)
>    1 min below 10us (1.31%)
>    8 min above 10ms (10.52%)
>      min out of spec 11.84%
>   85 expressions (5.15%)
> 1(0) min below 10us (1.50%)*
> 6(2) min above 10ms (7.50%)*
>      min out of spec 9.0%
> Errors:
>   23 where min==max  (1.39%)
>    0 where max < min (0.00%)
>
> Total:
>   Bugs: 6.48%-10.70%*
>   Crit: 3.09%-3.15%* (min < 10 and min==max and max < min)
>   Detectable by coccinelle:
>   Bugs: 74/103 (71.8%)
>   Crit: 50/52 (96.1%)
>
> *based on estimats as runtime values not known.
>
>
> there is no usleep() as noted in Documentation/timers/timers-howto.txt
>                 - Why not usleep?
>                         On slower systems, (embedded, OR perhaps a speed-
>                         stepped PC!) the overhead of setting up the hrtimers
>                         for usleep *may* not be worth it. Such an evaluation
>                         will obviously depend on your specific situation, but
>                         it is something to be aware of.
>
> and it suggests to use different API for different ranges - sounds sane
> to me and seems to cover the needs of drivers.

git grep usleep seems to disagree, at least I see a bunch of usleep()
calls. And per Rusty's api usability scale the ultimate fucntion is
dwim(). It just feels to me like pushing such trade-off decisions to
drivers is bad design because a) the tradeoffs depend upon the cpu
you're running on b) driver writers are pretty good at getting such
tradeoffs wrong in general. Aiming for a more dwim()-like api for this
here makes sense, instead of an eternal fight to correct drivers that
get it wrong all the time.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
+41 (0) 79 365 57 48 - http://blog.ffwll.ch

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15 10:34           ` Nicholas Mc Guire
@ 2016-12-15 11:48             ` Jani Nikula
  2016-12-15 11:51             ` Ville Syrjälä
  1 sibling, 0 replies; 13+ messages in thread
From: Jani Nikula @ 2016-12-15 11:48 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Ville Syrjälä,
	intel-gfx, linux-kernel, dri-devel, Nicholas Mc Guire,
	Daniel Vetter

On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
>> On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
>> > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
>> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
>> >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
>> >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
>> >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
>> >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
>> >> >> > change this to a udelay(2).
>> >> >> 
>> >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
>> >> >> so this boils down to which is the better use of CPU. We could probably
>> >> >> relax the max delay more if that was helpful. But I'm not immediately
>> >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
>> >> >> 
>> >> >> I'm sorry it's not clear in my other reply that I do appreciate
>> >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
>> >> >> convinced udelay() is the answer.
>> >> >
>> >> > if the delay is not critical and all that is needed 
>> >> > is an assurance that it is greater than X us then 
>> >> > usleep_range is fine with a relaxed limit. 
>> >> > So from what you wrote my patch proposal is wrong - the
>> >> > udelay() is not the way to got.
>> >> > My intent is to get rid of very small usleep_range() cases
>> >> > so if usleep_range(20,50) causes no issues with this driver
>> >> > and does not induce any performance penalty then that would 
>> >> > be the way to go I think.
>> >> 
>> >> Okay, so I looked at the code, and I looked at our spec, and I looked at
>> >> the MIPI D-PHY spec, and I cried a little.
>> >> 
>> >> Long story short, I think usleep_range(10, 50) will be fine.
>> >
>> > Note that I really want to see a comment next to every delay like this
>> > documenting the actual hardware requirement, if the delay used by the
>> > code doesn't 100% match it.
>> 
>> Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
>> view wrt D-PHY, and our code doesn't even match the spec. Hence the
>> tears. Want to propose a wording for the comment so we can apply this
>> change, without going for a full rewrite of the sequence?
>>
> is that suitable or am I overdoing it ?
>
> -               usleep_range(2, 3);
> +               /* delay for at least 2us - relaxed to 10-50 to allow
> +                * hrtimer subsystem to optimize uncritical timer handling
> +                */
> +               usleep_range(10, 50);

I'm fine with that. Or maybe just make it "relaxed to allow" without the
values.

Jani.


>
> thx!
> hofrat 

-- 
Jani Nikula, Intel Open Source Technology Center

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] drm/i915: use udelay for very short delays
  2016-12-15 10:34           ` Nicholas Mc Guire
  2016-12-15 11:48             ` Jani Nikula
@ 2016-12-15 11:51             ` Ville Syrjälä
  1 sibling, 0 replies; 13+ messages in thread
From: Ville Syrjälä @ 2016-12-15 11:51 UTC (permalink / raw)
  To: Nicholas Mc Guire
  Cc: Jani Nikula, intel-gfx, linux-kernel, dri-devel,
	Nicholas Mc Guire, Daniel Vetter

On Thu, Dec 15, 2016 at 10:34:00AM +0000, Nicholas Mc Guire wrote:
> On Thu, Dec 15, 2016 at 12:20:01PM +0200, Jani Nikula wrote:
> > On Thu, 15 Dec 2016, Ville Syrjälä <ville.syrjala@linux.intel.com> wrote:
> > > On Thu, Dec 15, 2016 at 11:52:34AM +0200, Jani Nikula wrote:
> > >> On Thu, 15 Dec 2016, Nicholas Mc Guire <der.herr@hofr.at> wrote:
> > >> > On Thu, Dec 15, 2016 at 11:08:49AM +0200, Jani Nikula wrote:
> > >> >> On Thu, 15 Dec 2016, Nicholas Mc Guire <hofrat@osadl.org> wrote:
> > >> >> > Even on fast systems a 2 microsecond delay is most likely more efficient
> > >> >> > as a busy-wait loop. The overhead of a hrtimer does not seem warranted -
> > >> >> > change this to a udelay(2).
> > >> >> 
> > >> >> Similar concerns as in [1]. We don't need the accuracy of udelay() here,
> > >> >> so this boils down to which is the better use of CPU. We could probably
> > >> >> relax the max delay more if that was helpful. But I'm not immediately
> > >> >> sold on "is most likely more efficient" which sounds like a gut feeling.
> > >> >> 
> > >> >> I'm sorry it's not clear in my other reply that I do appreciate
> > >> >> addressing incorrect/silly use of usleep_range(); I'm just not (yet)
> > >> >> convinced udelay() is the answer.
> > >> >
> > >> > if the delay is not critical and all that is needed 
> > >> > is an assurance that it is greater than X us then 
> > >> > usleep_range is fine with a relaxed limit. 
> > >> > So from what you wrote my patch proposal is wrong - the
> > >> > udelay() is not the way to got.
> > >> > My intent is to get rid of very small usleep_range() cases
> > >> > so if usleep_range(20,50) causes no issues with this driver
> > >> > and does not induce any performance penalty then that would 
> > >> > be the way to go I think.
> > >> 
> > >> Okay, so I looked at the code, and I looked at our spec, and I looked at
> > >> the MIPI D-PHY spec, and I cried a little.
> > >> 
> > >> Long story short, I think usleep_range(10, 50) will be fine.
> > >
> > > Note that I really want to see a comment next to every delay like this
> > > documenting the actual hardware requirement, if the delay used by the
> > > code doesn't 100% match it.
> > 
> > Our spec says, "Wait for 2us for ULPS to complete". That's a simplistic
> > view wrt D-PHY, and our code doesn't even match the spec. Hence the
> > tears. Want to propose a wording for the comment so we can apply this
> > change, without going for a full rewrite of the sequence?
> >
> is that suitable or am I overdoing it ?
> 
> -               usleep_range(2, 3);
> +               /* delay for at least 2us - relaxed to 10-50 to allow
> +                * hrtimer subsystem to optimize uncritical timer handling
> +                */

That's entirely too verbose IMO, and the reason for using usleep_range()
is pretty obvious without spelling it out.

All we really want to know is what the spec says is the minimum
acceptable delay.

> +               usleep_range(10, 50);
> 
> thx!
> hofrat 

-- 
Ville Syrjälä
Intel OTC

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-12-15 12:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-15  4:29 [PATCH] drm/i915: use udelay for very short delays Nicholas Mc Guire
2016-12-15  9:08 ` Jani Nikula
2016-12-15  9:25   ` [Intel-gfx] " Daniel Vetter
2016-12-15  9:27     ` Daniel Vetter
2016-12-15 10:51       ` Nicholas Mc Guire
2016-12-15 11:39         ` Daniel Vetter
2016-12-15  9:28   ` Nicholas Mc Guire
2016-12-15  9:52     ` Jani Nikula
2016-12-15 10:10       ` Ville Syrjälä
2016-12-15 10:20         ` Jani Nikula
2016-12-15 10:34           ` Nicholas Mc Guire
2016-12-15 11:48             ` Jani Nikula
2016-12-15 11:51             ` Ville Syrjälä

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).