All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] pwm: tegra: Optimize period calculation
@ 2022-04-25 13:22 Uwe Kleine-König
  2022-05-20 14:20 ` Thierry Reding
                   ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Uwe Kleine-König @ 2022-04-25 13:22 UTC (permalink / raw)
  To: Thierry Reding, Lee Jones, Jonathan Hunter; +Cc: linux-tegra, linux-pwm, kernel

Dividing by the result of a division looses precision because the result is
rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
following numbers result:

	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144

The exact result would be 6142.5061875 and (apart from rounding) this is
found by using a single division. As a side effect is also a tad
cheaper to calculate.

Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
example clk_rate = 47999999 and period = 106667:

	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
			    NSEC_PER_SEC) = 19

	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20

(The exact result is 20.000062083332033.)

With this optimizations also switch from round-closest to round-down for
the period calculation. Given that the calculations were non-optimal for
quite some time now with variations in both directions which nobody
reported as a problem, this is the opportunity to align the driver's
behavior to the requirements of new drivers. This has several upsides:

 - Implementation is easier as there are no round-nearest variants of
   mul_u64_u64_div_u64().
 - Requests for too small periods are now consistently refused. This was
   kind of arbitrary before, where period_ns < min_period_ns was
   refused, but in some cases min_period_ns isn't actually implementable
   and then values between min_period_ns and the actual minimum were
   rounded up to the actual minimum.

Note that the duty_cycle calculation isn't using the usual round-down
approach yet.

Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
---
Hello,

changes since (implicit) v1: Updated changelog to explain why rate = 0
is refused now.

Best regards
Uwe

 drivers/pwm/pwm-tegra.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
index e5a9ffef4a71..7fc03a9ec154 100644
--- a/drivers/pwm/pwm-tegra.c
+++ b/drivers/pwm/pwm-tegra.c
@@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
 			    int duty_ns, int period_ns)
 {
 	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
-	unsigned long long c = duty_ns, hz;
+	unsigned long long c = duty_ns;
 	unsigned long rate, required_clk_rate;
 	u32 val = 0;
 	int err;
@@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
 		pc->clk_rate = clk_get_rate(pc->clk);
 	}
 
-	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
-
 	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
-	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
-	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
+	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
+				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
 
 	/*
 	 * Since the actual PWM divider is the register's frequency divider
@@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
 	 */
 	if (rate > 0)
 		rate--;
+	else
+		return -EINVAL;
 
 	/*
 	 * Make sure that the rate will fit in the register's frequency

base-commit: 2bf8ee0faa988b5cec3503ebf2f970a0e84d24ee
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-04-25 13:22 [PATCH v2] pwm: tegra: Optimize period calculation Uwe Kleine-König
@ 2022-05-20 14:20 ` Thierry Reding
  2022-08-15  0:28 ` Dmitry Osipenko
  2022-10-25 14:22 ` Jon Hunter
  2 siblings, 0 replies; 16+ messages in thread
From: Thierry Reding @ 2022-05-20 14:20 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Lee Jones, Jonathan Hunter, linux-tegra, linux-pwm, kernel

[-- Attachment #1: Type: text/plain, Size: 2236 bytes --]

On Mon, Apr 25, 2022 at 03:22:44PM +0200, Uwe Kleine-König wrote:
> Dividing by the result of a division looses precision because the result is
> rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
> following numbers result:
> 
> 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
> 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
> 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
> 
> The exact result would be 6142.5061875 and (apart from rounding) this is
> found by using a single division. As a side effect is also a tad
> cheaper to calculate.
> 
> Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
> example clk_rate = 47999999 and period = 106667:
> 
> 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
> 			    NSEC_PER_SEC) = 19
> 
> 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
> 
> (The exact result is 20.000062083332033.)
> 
> With this optimizations also switch from round-closest to round-down for
> the period calculation. Given that the calculations were non-optimal for
> quite some time now with variations in both directions which nobody
> reported as a problem, this is the opportunity to align the driver's
> behavior to the requirements of new drivers. This has several upsides:
> 
>  - Implementation is easier as there are no round-nearest variants of
>    mul_u64_u64_div_u64().
>  - Requests for too small periods are now consistently refused. This was
>    kind of arbitrary before, where period_ns < min_period_ns was
>    refused, but in some cases min_period_ns isn't actually implementable
>    and then values between min_period_ns and the actual minimum were
>    rounded up to the actual minimum.
> 
> Note that the duty_cycle calculation isn't using the usual round-down
> approach yet.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
> Hello,
> 
> changes since (implicit) v1: Updated changelog to explain why rate = 0
> is refused now.
> 
> Best regards
> Uwe
> 
>  drivers/pwm/pwm-tegra.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)

Applied, thanks.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-04-25 13:22 [PATCH v2] pwm: tegra: Optimize period calculation Uwe Kleine-König
  2022-05-20 14:20 ` Thierry Reding
@ 2022-08-15  0:28 ` Dmitry Osipenko
  2022-08-15  7:09   ` Uwe Kleine-König
       [not found]   ` <CAPVz0n1Xy=feSqw7_bvNw17=xVGnk2yhAMFmyfddU128dU+5qQ@mail.gmail.com>
  2022-10-25 14:22 ` Jon Hunter
  2 siblings, 2 replies; 16+ messages in thread
From: Dmitry Osipenko @ 2022-08-15  0:28 UTC (permalink / raw)
  To: Uwe Kleine-König, Thierry Reding, Lee Jones,
	Jonathan Hunter, Svyatoslav Ryhel, Maxim Schwalm
  Cc: linux-tegra, linux-pwm, kernel

Hi,

25.04.2022 16:22, Uwe Kleine-König пишет:
> Dividing by the result of a division looses precision because the result is
> rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
> following numbers result:
> 
> 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
> 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
> 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
> 
> The exact result would be 6142.5061875 and (apart from rounding) this is
> found by using a single division. As a side effect is also a tad
> cheaper to calculate.
> 
> Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
> example clk_rate = 47999999 and period = 106667:
> 
> 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
> 			    NSEC_PER_SEC) = 19
> 
> 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
> 
> (The exact result is 20.000062083332033.)
> 
> With this optimizations also switch from round-closest to round-down for
> the period calculation. Given that the calculations were non-optimal for
> quite some time now with variations in both directions which nobody
> reported as a problem, this is the opportunity to align the driver's
> behavior to the requirements of new drivers. This has several upsides:
> 
>  - Implementation is easier as there are no round-nearest variants of
>    mul_u64_u64_div_u64().
>  - Requests for too small periods are now consistently refused. This was
>    kind of arbitrary before, where period_ns < min_period_ns was
>    refused, but in some cases min_period_ns isn't actually implementable
>    and then values between min_period_ns and the actual minimum were
>    rounded up to the actual minimum.
> 
> Note that the duty_cycle calculation isn't using the usual round-down
> approach yet.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
> Hello,
> 
> changes since (implicit) v1: Updated changelog to explain why rate = 0
> is refused now.
> 
> Best regards
> Uwe
> 
>  drivers/pwm/pwm-tegra.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> index e5a9ffef4a71..7fc03a9ec154 100644
> --- a/drivers/pwm/pwm-tegra.c
> +++ b/drivers/pwm/pwm-tegra.c
> @@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>  			    int duty_ns, int period_ns)
>  {
>  	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
> -	unsigned long long c = duty_ns, hz;
> +	unsigned long long c = duty_ns;
>  	unsigned long rate, required_clk_rate;
>  	u32 val = 0;
>  	int err;
> @@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>  		pc->clk_rate = clk_get_rate(pc->clk);
>  	}
>  
> -	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
> -
>  	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
> -	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
> -	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
> +	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> +				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
>  
>  	/*
>  	 * Since the actual PWM divider is the register's frequency divider
> @@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>  	 */
>  	if (rate > 0)
>  		rate--;
> +	else
> +		return -EINVAL;

This patch broke backlight on Asus Transformer tablets, they are now
getting this -EINVAL. The root of the problem is under investigation.

Should we revert this patch meantime or maybe you (Uwe/Thierry) have an
idea about what actually has gone wrong here? Thanks in advance.

>  	/*
>  	 * Make sure that the rate will fit in the register's frequency
> 
> base-commit: 2bf8ee0faa988b5cec3503ebf2f970a0e84d24ee


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-08-15  0:28 ` Dmitry Osipenko
@ 2022-08-15  7:09   ` Uwe Kleine-König
  2022-08-18  7:54     ` Uwe Kleine-König
       [not found]   ` <CAPVz0n1Xy=feSqw7_bvNw17=xVGnk2yhAMFmyfddU128dU+5qQ@mail.gmail.com>
  1 sibling, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2022-08-15  7:09 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: Thierry Reding, Lee Jones, Jonathan Hunter, Svyatoslav Ryhel,
	Maxim Schwalm, linux-tegra, linux-pwm, kernel

[-- Attachment #1: Type: text/plain, Size: 5340 bytes --]

On Mon, Aug 15, 2022 at 03:28:25AM +0300, Dmitry Osipenko wrote:
> Hi,
> 
> 25.04.2022 16:22, Uwe Kleine-König пишет:
> > Dividing by the result of a division looses precision because the result is
> > rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
> > following numbers result:
> > 
> > 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
> > 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
> > 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
> > 
> > The exact result would be 6142.5061875 and (apart from rounding) this is
> > found by using a single division. As a side effect is also a tad
> > cheaper to calculate.
> > 
> > Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
> > example clk_rate = 47999999 and period = 106667:
> > 
> > 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
> > 			    NSEC_PER_SEC) = 19
> > 
> > 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
> > 
> > (The exact result is 20.000062083332033.)
> > 
> > With this optimizations also switch from round-closest to round-down for
> > the period calculation. Given that the calculations were non-optimal for
> > quite some time now with variations in both directions which nobody
> > reported as a problem, this is the opportunity to align the driver's
> > behavior to the requirements of new drivers. This has several upsides:
> > 
> >  - Implementation is easier as there are no round-nearest variants of
> >    mul_u64_u64_div_u64().
> >  - Requests for too small periods are now consistently refused. This was
> >    kind of arbitrary before, where period_ns < min_period_ns was
> >    refused, but in some cases min_period_ns isn't actually implementable
> >    and then values between min_period_ns and the actual minimum were
> >    rounded up to the actual minimum.
> > 
> > Note that the duty_cycle calculation isn't using the usual round-down
> > approach yet.
> > 
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > ---
> > Hello,
> > 
> > changes since (implicit) v1: Updated changelog to explain why rate = 0
> > is refused now.
> > 
> > Best regards
> > Uwe
> > 
> >  drivers/pwm/pwm-tegra.c | 10 +++++-----
> >  1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> > index e5a9ffef4a71..7fc03a9ec154 100644
> > --- a/drivers/pwm/pwm-tegra.c
> > +++ b/drivers/pwm/pwm-tegra.c
> > @@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >  			    int duty_ns, int period_ns)
> >  {
> >  	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
> > -	unsigned long long c = duty_ns, hz;
> > +	unsigned long long c = duty_ns;
> >  	unsigned long rate, required_clk_rate;
> >  	u32 val = 0;
> >  	int err;
> > @@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >  		pc->clk_rate = clk_get_rate(pc->clk);
> >  	}
> >  
> > -	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
> > -
> >  	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
> > -	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
> > -	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
> > +	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > +				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
> >  
> >  	/*
> >  	 * Since the actual PWM divider is the register's frequency divider
> > @@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >  	 */
> >  	if (rate > 0)
> >  		rate--;
> > +	else
> > +		return -EINVAL;
> 
> This patch broke backlight on Asus Transformer tablets, they are now
> getting this -EINVAL. The root of the problem is under investigation.

This means that you requested a period that is smaller than the minimal
period the hardware can implement.

What is the clk rate of the PWM clk (i.e. pc->clk_rate?). Looking at
arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi I guess period is
4000000. That in turn would mean that

	mul_u64_u64_div_u64(pc->clk_rate, period_ns, (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH)

returned 0 which (with the assumption period_ns = 4000000) would imply
the clk rate is less than 64000.

I don't know the machine, but some more information would be good: What
is the actual clock rate? Can you please enable PWM_DEBUG (at compile
time) and tracing (at runtime) (i.e.

	echo 1 > /sys/kernel/debug/tracing/events/pwm/enable

), reproduce the problem and provide the trace (i.e.

	cat /sys/kernel/debug/tracing/trace

)?

> Should we revert this patch meantime or maybe you (Uwe/Thierry) have an
> idea about what actually has gone wrong here? Thanks in advance.

I'd like to understand if the problem is indeed that the backlight
driver requests a too small period. In this case I'd prefer to adapt the
backlight device to use better pwm settings. If there is a problem in
my change, this needs to be fixed. If you provide the above data, I can
check the details.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-08-15  7:09   ` Uwe Kleine-König
@ 2022-08-18  7:54     ` Uwe Kleine-König
  2022-08-18 20:34       ` Dmitry Osipenko
  0 siblings, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2022-08-18  7:54 UTC (permalink / raw)
  To: Dmitry Osipenko
  Cc: linux-pwm, Maxim Schwalm, Svyatoslav Ryhel, Jonathan Hunter,
	Thierry Reding, kernel, linux-tegra, Lee Jones

[-- Attachment #1: Type: text/plain, Size: 5961 bytes --]

Hello,

On Mon, Aug 15, 2022 at 09:09:35AM +0200, Uwe Kleine-König wrote:
> On Mon, Aug 15, 2022 at 03:28:25AM +0300, Dmitry Osipenko wrote:
> > 25.04.2022 16:22, Uwe Kleine-König пишет:
> > > Dividing by the result of a division looses precision because the result is
> > > rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
> > > following numbers result:
> > > 
> > > 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
> > > 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
> > > 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
> > > 
> > > The exact result would be 6142.5061875 and (apart from rounding) this is
> > > found by using a single division. As a side effect is also a tad
> > > cheaper to calculate.
> > > 
> > > Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
> > > example clk_rate = 47999999 and period = 106667:
> > > 
> > > 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
> > > 			    NSEC_PER_SEC) = 19
> > > 
> > > 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > > 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
> > > 
> > > (The exact result is 20.000062083332033.)
> > > 
> > > With this optimizations also switch from round-closest to round-down for
> > > the period calculation. Given that the calculations were non-optimal for
> > > quite some time now with variations in both directions which nobody
> > > reported as a problem, this is the opportunity to align the driver's
> > > behavior to the requirements of new drivers. This has several upsides:
> > > 
> > >  - Implementation is easier as there are no round-nearest variants of
> > >    mul_u64_u64_div_u64().
> > >  - Requests for too small periods are now consistently refused. This was
> > >    kind of arbitrary before, where period_ns < min_period_ns was
> > >    refused, but in some cases min_period_ns isn't actually implementable
> > >    and then values between min_period_ns and the actual minimum were
> > >    rounded up to the actual minimum.
> > > 
> > > Note that the duty_cycle calculation isn't using the usual round-down
> > > approach yet.
> > > 
> > > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > > ---
> > > Hello,
> > > 
> > > changes since (implicit) v1: Updated changelog to explain why rate = 0
> > > is refused now.
> > > 
> > > Best regards
> > > Uwe
> > > 
> > >  drivers/pwm/pwm-tegra.c | 10 +++++-----
> > >  1 file changed, 5 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> > > index e5a9ffef4a71..7fc03a9ec154 100644
> > > --- a/drivers/pwm/pwm-tegra.c
> > > +++ b/drivers/pwm/pwm-tegra.c
> > > @@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> > >  			    int duty_ns, int period_ns)
> > >  {
> > >  	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
> > > -	unsigned long long c = duty_ns, hz;
> > > +	unsigned long long c = duty_ns;
> > >  	unsigned long rate, required_clk_rate;
> > >  	u32 val = 0;
> > >  	int err;
> > > @@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> > >  		pc->clk_rate = clk_get_rate(pc->clk);
> > >  	}
> > >  
> > > -	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
> > > -
> > >  	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
> > > -	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
> > > -	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
> > > +	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > > +				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
> > >  
> > >  	/*
> > >  	 * Since the actual PWM divider is the register's frequency divider
> > > @@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> > >  	 */
> > >  	if (rate > 0)
> > >  		rate--;
> > > +	else
> > > +		return -EINVAL;
> > 
> > This patch broke backlight on Asus Transformer tablets, they are now
> > getting this -EINVAL. The root of the problem is under investigation.
> 
> This means that you requested a period that is smaller than the minimal
> period the hardware can implement.
> 
> What is the clk rate of the PWM clk (i.e. pc->clk_rate?). Looking at
> arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi I guess period is
> 4000000. That in turn would mean that
> 
> 	mul_u64_u64_div_u64(pc->clk_rate, period_ns, (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH)
> 
> returned 0 which (with the assumption period_ns = 4000000) would imply
> the clk rate is less than 64000.
> 
> I don't know the machine, but some more information would be good: What
> is the actual clock rate? Can you please enable PWM_DEBUG (at compile
> time) and tracing (at runtime) (i.e.
> 
> 	echo 1 > /sys/kernel/debug/tracing/events/pwm/enable
> 
> ), reproduce the problem and provide the trace (i.e.
> 
> 	cat /sys/kernel/debug/tracing/trace
> 
> )?
> 
> > Should we revert this patch meantime or maybe you (Uwe/Thierry) have an
> > idea about what actually has gone wrong here? Thanks in advance.
> 
> I'd like to understand if the problem is indeed that the backlight
> driver requests a too small period. In this case I'd prefer to adapt the
> backlight device to use better pwm settings. If there is a problem in
> my change, this needs to be fixed. If you provide the above data, I can
> check the details.

I'd like to get this regression fixed and so a feedback from your side
would be highly appreciated. Without further input I'm unable to debug
this and a revert would be a sad outcome. Can you at least work out the
clk rate, which might be possible by looking into
/sys/kernel/debug/clk/clk_summary.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-08-18  7:54     ` Uwe Kleine-König
@ 2022-08-18 20:34       ` Dmitry Osipenko
  2022-09-21  8:17         ` Uwe Kleine-König
  0 siblings, 1 reply; 16+ messages in thread
From: Dmitry Osipenko @ 2022-08-18 20:34 UTC (permalink / raw)
  To: Uwe Kleine-König, Dmitry Osipenko
  Cc: linux-pwm, Maxim Schwalm, Svyatoslav Ryhel, Jonathan Hunter,
	Thierry Reding, kernel, linux-tegra, Lee Jones, Thomas Graichen

On 8/18/22 10:54, Uwe Kleine-König wrote:
> Hello,
> 
> On Mon, Aug 15, 2022 at 09:09:35AM +0200, Uwe Kleine-König wrote:
>> On Mon, Aug 15, 2022 at 03:28:25AM +0300, Dmitry Osipenko wrote:
>>> 25.04.2022 16:22, Uwe Kleine-König пишет:
>>>> Dividing by the result of a division looses precision because the result is
>>>> rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
>>>> following numbers result:
>>>>
>>>> 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
>>>> 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
>>>> 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
>>>>
>>>> The exact result would be 6142.5061875 and (apart from rounding) this is
>>>> found by using a single division. As a side effect is also a tad
>>>> cheaper to calculate.
>>>>
>>>> Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
>>>> example clk_rate = 47999999 and period = 106667:
>>>>
>>>> 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
>>>> 			    NSEC_PER_SEC) = 19
>>>>
>>>> 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
>>>> 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
>>>>
>>>> (The exact result is 20.000062083332033.)
>>>>
>>>> With this optimizations also switch from round-closest to round-down for
>>>> the period calculation. Given that the calculations were non-optimal for
>>>> quite some time now with variations in both directions which nobody
>>>> reported as a problem, this is the opportunity to align the driver's
>>>> behavior to the requirements of new drivers. This has several upsides:
>>>>
>>>>  - Implementation is easier as there are no round-nearest variants of
>>>>    mul_u64_u64_div_u64().
>>>>  - Requests for too small periods are now consistently refused. This was
>>>>    kind of arbitrary before, where period_ns < min_period_ns was
>>>>    refused, but in some cases min_period_ns isn't actually implementable
>>>>    and then values between min_period_ns and the actual minimum were
>>>>    rounded up to the actual minimum.
>>>>
>>>> Note that the duty_cycle calculation isn't using the usual round-down
>>>> approach yet.
>>>>
>>>> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
>>>> ---
>>>> Hello,
>>>>
>>>> changes since (implicit) v1: Updated changelog to explain why rate = 0
>>>> is refused now.
>>>>
>>>> Best regards
>>>> Uwe
>>>>
>>>>  drivers/pwm/pwm-tegra.c | 10 +++++-----
>>>>  1 file changed, 5 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
>>>> index e5a9ffef4a71..7fc03a9ec154 100644
>>>> --- a/drivers/pwm/pwm-tegra.c
>>>> +++ b/drivers/pwm/pwm-tegra.c
>>>> @@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>>>>  			    int duty_ns, int period_ns)
>>>>  {
>>>>  	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
>>>> -	unsigned long long c = duty_ns, hz;
>>>> +	unsigned long long c = duty_ns;
>>>>  	unsigned long rate, required_clk_rate;
>>>>  	u32 val = 0;
>>>>  	int err;
>>>> @@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>>>>  		pc->clk_rate = clk_get_rate(pc->clk);
>>>>  	}
>>>>  
>>>> -	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
>>>> -
>>>>  	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
>>>> -	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
>>>> -	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
>>>> +	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
>>>> +				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
>>>>  
>>>>  	/*
>>>>  	 * Since the actual PWM divider is the register's frequency divider
>>>> @@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>>>>  	 */
>>>>  	if (rate > 0)
>>>>  		rate--;
>>>> +	else
>>>> +		return -EINVAL;
>>>
>>> This patch broke backlight on Asus Transformer tablets, they are now
>>> getting this -EINVAL. The root of the problem is under investigation.
>>
>> This means that you requested a period that is smaller than the minimal
>> period the hardware can implement.
>>
>> What is the clk rate of the PWM clk (i.e. pc->clk_rate?). Looking at
>> arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi I guess period is
>> 4000000. That in turn would mean that
>>
>> 	mul_u64_u64_div_u64(pc->clk_rate, period_ns, (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH)
>>
>> returned 0 which (with the assumption period_ns = 4000000) would imply
>> the clk rate is less than 64000.
>>
>> I don't know the machine, but some more information would be good: What
>> is the actual clock rate? Can you please enable PWM_DEBUG (at compile
>> time) and tracing (at runtime) (i.e.
>>
>> 	echo 1 > /sys/kernel/debug/tracing/events/pwm/enable
>>
>> ), reproduce the problem and provide the trace (i.e.
>>
>> 	cat /sys/kernel/debug/tracing/trace
>>
>> )?
>>
>>> Should we revert this patch meantime or maybe you (Uwe/Thierry) have an
>>> idea about what actually has gone wrong here? Thanks in advance.
>>
>> I'd like to understand if the problem is indeed that the backlight
>> driver requests a too small period. In this case I'd prefer to adapt the
>> backlight device to use better pwm settings. If there is a problem in
>> my change, this needs to be fixed. If you provide the above data, I can
>> check the details.
> 
> I'd like to get this regression fixed and so a feedback from your side
> would be highly appreciated. Without further input I'm unable to debug
> this and a revert would be a sad outcome. Can you at least work out the
> clk rate, which might be possible by looking into
> /sys/kernel/debug/clk/clk_summary.

+ Thomas Graichen, who reported on irc that it also broke backlight on
Nyan Big Chromebook

-- 
Best regards,
Dmitry

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-08-18 20:34       ` Dmitry Osipenko
@ 2022-09-21  8:17         ` Uwe Kleine-König
  2022-09-22 11:12           ` Jon Hunter
  0 siblings, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2022-09-21  8:17 UTC (permalink / raw)
  To: Dmitry Osipenko, Thomas Graichen
  Cc: Dmitry Osipenko, linux-pwm, Maxim Schwalm, Svyatoslav Ryhel,
	Jonathan Hunter, Thierry Reding, kernel, linux-tegra, Lee Jones

[-- Attachment #1: Type: text/plain, Size: 3307 bytes --]

Hello,

there was some off-list discussion about this problem. I summarize the
technical details and findings here.

On Thu, Aug 18, 2022 at 11:34:03PM +0300, Dmitry Osipenko wrote:
> On 8/18/22 10:54, Uwe Kleine-König wrote:
> > On Mon, Aug 15, 2022 at 09:09:35AM +0200, Uwe Kleine-König wrote:
> >> On Mon, Aug 15, 2022 at 03:28:25AM +0300, Dmitry Osipenko wrote:
> >>> This patch broke backlight on Asus Transformer tablets, they are now
> >>> getting this -EINVAL. The root of the problem is under investigation.
> >>
> >> This means that you requested a period that is smaller than the minimal
> >> period the hardware can implement.
> >>
> >> What is the clk rate of the PWM clk (i.e. pc->clk_rate?). Looking at
> >> arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi I guess period is
> >> 4000000. That in turn would mean that

I got some traces, and while I don't understand the whole picture yet,
they suggest the period is 1000000 ns only.

> >>
> >> 	mul_u64_u64_div_u64(pc->clk_rate, period_ns, (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH)
> >>
> >> returned 0 which (with the assumption period_ns = 4000000) would imply
> >> the clk rate is less than 64000.

As the clk-rate is only 32768 Hz we get (with period_ns = 1000000)

	32768 * 1000000 / (1000000000 << 8) = 0.128

which is indeed rounded down to 0 and then runs into the error path
returning -EINVAL. Before my change (that now broke the backlight
configuration) configuration continued and then ended with actually
configuring period = 7812500 ns which is off by nearly a factor of 8.

I didn't find a device tree for an Asus Transformer tablet bases on a
tegra124 in the kernel source, but the options are:

 - Revert commit 8c193f4714df ("pwm: tegra: Optimize period calculation").
   I don't like this. IMHO this commit is an improvement and the problem
   is that the consumer requests a too small period. For a backlight
   this might be ok to result in a much bigger period, for other
   usecases it isn't and so I like refusing period = 1000000.

 - We could just drop the "else / return -EINVAL".
   This is inconsistent as then (again) some periods are rounded up
   (with the given clk rate that would be 5334 <= period < 7812500)
   while others (period < 5334) yield -EINVAL.

 - Increase the period that the backlight is using to at least 7812500.
   This is done (I guess) by replacing 1000000 by 7812500 (or more) in
   the backlight's PWM phandle.

 - Make the PWM clk faster.
   Looking quickly through the tegra clk driver, the parent of the PWM
   clk could be changed from clk_32k to pll_p or pll_c. This should be
   doable in the dts using something like:

   	assigned-clocks = <&tegra_car TEGRA124_CLK_PWM>;
	assigned-clock-parents = <&tegra_car TEGRA124_CLK_PLL_P>;

   in the pwm node. (Note this includes much guesswork, I don't know the
   PPL's clk-rate, so this might break in other ways. Another option is
   using PLL_C.)

Probably the latter is the nicest option. Is it possible to find out the
setting when the machine is running the original vendor OS?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
       [not found]   ` <CAPVz0n1Xy=feSqw7_bvNw17=xVGnk2yhAMFmyfddU128dU+5qQ@mail.gmail.com>
@ 2022-09-21 13:32     ` Uwe Kleine-König
       [not found]       ` <CAPVz0n19V5Lx889GO7wRzuvPAdBeVE8vXsMzQ-6EGyp4DFGD5w@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2022-09-21 13:32 UTC (permalink / raw)
  To: Svyatoslav Ryhel
  Cc: Dmitry Osipenko, linux-pwm, Maxim Schwalm, Jonathan Hunter,
	Thierry Reding, kernel, linux-tegra, Lee Jones

[-- Attachment #1: Type: text/plain, Size: 5341 bytes --]

Hello Svyatoslav,

On Wed, Sep 21, 2022 at 12:33:48PM +0300, Svyatoslav Ryhel wrote:
> Asus Transformers are based on T20 and T30 SoC.
> 
> Thanks for your patience, I am enclosing the pwm trace you requested. It is
> taken from Asus Transformer TF700T owned by Maxim Schwalm (he is included
> into this mailing list but was not able to send trace).
> 
> Best regards. Svyatoslav R.

Yeah, it seems several devices are affected. The one with a tegra124 was
a Nyan one.

> cat /sys/kernel/debug/tracing/trace
> # tracer: nop
> #
> # entries-in-buffer/entries-written: 26/26   #P:4
> #
> #                                _-----=&gt; irqs-off/BH-disabled
> #                               / _----=&gt; need-resched
> #                              | / _---=&gt; hardirq/softirq
> #                              || / _--=&gt; preempt-depth
> #                              ||| / _-=&gt; migrate-disable
> #                              |||| /     delay
> #           TASK-PID     CPU#  |||||  TIMESTAMP  FUNCTION
> #              | |         |   |||||     |         |
>  gsd-backlight-h-2626    [001] .n...   161.995352: pwm_apply: 94faa5b6: period=4000000 duty_cycle=439215 polarity=0 enabled=1
>  gsd-backlight-h-2631    [001] .....   162.148999: pwm_apply: 94faa5b6: period=4000000 duty_cycle=486274 polarity=0 enabled=1
>  gsd-backlight-h-2635    [001] .....   162.315167: pwm_apply: 94faa5b6: period=4000000 duty_cycle=674509 polarity=0 enabled=1
>  gsd-backlight-h-2639    [001] .....   162.472855: pwm_apply: 94faa5b6: period=4000000 duty_cycle=1113725 polarity=0 enabled=1
>  gsd-backlight-h-2643    [000] .....   162.672130: pwm_apply: 94faa5b6: period=4000000 duty_cycle=1584313 polarity=0 enabled=1
>  gsd-backlight-h-2647    [000] .....   162.897159: pwm_apply: 94faa5b6: period=4000000 duty_cycle=2101960 polarity=0 enabled=1
>  gsd-backlight-h-2653    [000] .....   163.094710: pwm_apply: 94faa5b6: period=4000000 duty_cycle=2462745 polarity=0 enabled=1
>  gsd-backlight-h-2657    [000] .....   163.290172: pwm_apply: 94faa5b6: period=4000000 duty_cycle=1505882 polarity=0 enabled=1
>  gsd-backlight-h-2661    [001] .....   163.491553: pwm_apply: 94faa5b6: period=4000000 duty_cycle=831372 polarity=0 enabled=1
>  gsd-backlight-h-2665    [001] .....   163.749862: pwm_apply: 94faa5b6: period=4000000 duty_cycle=439215 polarity=0 enabled=1
>  gsd-backlight-h-2669    [001] .....   163.922487: pwm_apply: 94faa5b6: period=4000000 duty_cycle=47058 polarity=0 enabled=1
>  gsd-backlight-h-2809    [002] .....   313.762043: pwm_apply: 94faa5b6: period=4000000 duty_cycle=94117 polarity=0 enabled=1
>  gsd-backlight-h-2814    [003] .....   313.901480: pwm_apply: 94faa5b6: period=4000000 duty_cycle=643137 polarity=0 enabled=1
>  gsd-backlight-h-2821    [003] .....   314.033686: pwm_apply: 94faa5b6: period=4000000 duty_cycle=1427450 polarity=0 enabled=1
>  gsd-backlight-h-2825    [003] .....   314.168334: pwm_apply: 94faa5b6: period=4000000 duty_cycle=2101960 polarity=0 enabled=1
>  gsd-backlight-h-2829    [002] .....   314.294098: pwm_apply: 94faa5b6: period=4000000 duty_cycle=2729411 polarity=0 enabled=1
>  gsd-backlight-h-2833    [002] .....   314.424415: pwm_apply: 94faa5b6: period=4000000 duty_cycle=3560784 polarity=0 enabled=1
>  gsd-backlight-h-2837    [002] .....   314.530186: pwm_apply: 94faa5b6: period=4000000 duty_cycle=3843137 polarity=0 enabled=1
>  gsd-backlight-h-2855    [003] .....   315.054516: pwm_apply: 94faa5b6: period=4000000 duty_cycle=4000000 polarity=0 enabled=1
>  gsd-backlight-h-2865    [003] .....   322.214140: pwm_apply: 94faa5b6: period=4000000 duty_cycle=3717647 polarity=0 enabled=1
>  gsd-backlight-h-2874    [002] .....   322.446465: pwm_apply: 94faa5b6: period=4000000 duty_cycle=1976470 polarity=0 enabled=1
>  gsd-backlight-h-2878    [002] .....   322.583918: pwm_apply: 94faa5b6: period=4000000 duty_cycle=721568 polarity=0 enabled=1
>  gsd-backlight-h-2882    [001] .....   322.682135: pwm_apply: 94faa5b6: period=4000000 duty_cycle=282352 polarity=0 enabled=1
>  gsd-backlight-h-2914    [001] .....   324.240133: pwm_apply: 94faa5b6: period=4000000 duty_cycle=407843 polarity=0 enabled=1
>  gsd-backlight-h-2918    [003] .....   324.376497: pwm_apply: 94faa5b6: period=4000000 duty_cycle=250980 polarity=0 enabled=1
>  gsd-backlight-h-2922    [001] .....   324.479360: pwm_apply: 94faa5b6: period=4000000 duty_cycle=47058 polarity=0 enabled=1

Assuming the parent clk rate of the PWM clock is running at 32768 Hz,
too, the problem is the same.

Maybe

diff --git a/arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi b/arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi
index c27e70d8bf2b..8e8cf0abd680 100644
--- a/arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi
+++ b/arch/arm/boot/dts/tegra30-asus-transformer-common.dtsi
@@ -1115,6 +1115,10 @@ bluetooth {
 
 	pwm@7000a000 {
 		status = "okay";
+
+		/* To ensure a faster clock than TEGRA30_CLK_CLK_32K */
+		assigned-clocks = <&tegra_car TEGRA30_CLK_PWM>;
+		assigned-clock-parents = <&tegra_car TEGRA30_CLK_PLL_P>;
 	};
 
 	lcd_ddc: i2c@7000c000 {

helps?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
       [not found]       ` <CAPVz0n19V5Lx889GO7wRzuvPAdBeVE8vXsMzQ-6EGyp4DFGD5w@mail.gmail.com>
@ 2022-09-21 17:09         ` Uwe Kleine-König
  0 siblings, 0 replies; 16+ messages in thread
From: Uwe Kleine-König @ 2022-09-21 17:09 UTC (permalink / raw)
  To: Svyatoslav Ryhel
  Cc: linux-pwm, Maxim Schwalm, Jonathan Hunter, Thierry Reding,
	kernel, linux-tegra, Dmitry Osipenko, Lee Jones

[-- Attachment #1: Type: text/plain, Size: 858 bytes --]

On Wed, Sep 21, 2022 at 05:22:17PM +0300, Svyatoslav Ryhel wrote:
> According to clk tree Maxim sent me, pwm's parent is already pll_p, I have
> checked my older logs and it is true for me as well. I can take a fresh clk

> log not sooner than tomorrow. I enclose Maxim clk tree from same device
> trace is (TF700T).

From the below clk tree:
>           pwm                         1        1        0    48000000          0     0  50000         Y

That confirms that here the clkrate isn't too slow. The trace also
suggests that you don't suffer from the problem I debugged before.

But you also have the problem that the backlight doesn't work at all for
you?

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-09-21  8:17         ` Uwe Kleine-König
@ 2022-09-22 11:12           ` Jon Hunter
  2022-09-23 12:10             ` Thierry Reding
  0 siblings, 1 reply; 16+ messages in thread
From: Jon Hunter @ 2022-09-22 11:12 UTC (permalink / raw)
  To: Uwe Kleine-König, Dmitry Osipenko, Thomas Graichen
  Cc: Dmitry Osipenko, linux-pwm, Maxim Schwalm, Svyatoslav Ryhel,
	Thierry Reding, kernel, linux-tegra, Lee Jones

Hi Uwe,

On 21/09/2022 09:17, Uwe Kleine-König wrote:

...

> As the clk-rate is only 32768 Hz we get (with period_ns = 1000000)
> 
> 	32768 * 1000000 / (1000000000 << 8) = 0.128
> 
> which is indeed rounded down to 0 and then runs into the error path
> returning -EINVAL. Before my change (that now broke the backlight
> configuration) configuration continued and then ended with actually
> configuring period = 7812500 ns which is off by nearly a factor of 8.

I am seeing the same issue on Tegra210 Jetson Nano (device-tree
tegra210-p3450-0000.dts). This also has a clock rate of 32768 Hz by
default which means the min period is 30517ns. However, in the probe
the min_period_ns comes from the pc->soc->max_frequency which is 48
MHz for Tegra210. The min_period_ns = 1/(48 MHz / (2^8)) which is
5334ns. Hence, the actual min period is less than what is actually
possible.

I wonder if we should be warning about this and fixing the min
period ...

diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
index 2f3dcb9e9278..f72928c05c81 100644
--- a/drivers/pwm/pwm-tegra.c
+++ b/drivers/pwm/pwm-tegra.c
@@ -310,9 +310,13 @@ static int tegra_pwm_probe(struct platform_device *pdev)
          */
         pc->clk_rate = clk_get_rate(pc->clk);
  
+       if (pc->clk_rate < pc->soc->max_frequency)
+               dev_warn(&pdev->dev, "Max frequency limited to %lu Hz!",
+                        pc->clk_rate);
+
         /* Set minimum limit of PWM period for the IP */
         pc->min_period_ns =
-           (NSEC_PER_SEC / (pc->soc->max_frequency >> PWM_DUTY_WIDTH)) + 1;
+           (NSEC_PER_SEC / (pc->clk_rate >> PWM_DUTY_WIDTH)) + 1;
  
         pc->rst = devm_reset_control_get_exclusive(&pdev->dev, "pwm");

The above does not fix this issue but ...
  
> I didn't find a device tree for an Asus Transformer tablet bases on a
> tegra124 in the kernel source, but the options are:
> 
>   - Revert commit 8c193f4714df ("pwm: tegra: Optimize period calculation").
>     I don't like this. IMHO this commit is an improvement and the problem
>     is that the consumer requests a too small period. For a backlight
>     this might be ok to result in a much bigger period, for other
>     usecases it isn't and so I like refusing period = 1000000.
> 
>   - We could just drop the "else / return -EINVAL".
>     This is inconsistent as then (again) some periods are rounded up
>     (with the given clk rate that would be 5334 <= period < 7812500)
>     while others (period < 5334) yield -EINVAL.
> 
>   - Increase the period that the backlight is using to at least 7812500.
>     This is done (I guess) by replacing 1000000 by 7812500 (or more) in
>     the backlight's PWM phandle.
> 
>   - Make the PWM clk faster.
>     Looking quickly through the tegra clk driver, the parent of the PWM
>     clk could be changed from clk_32k to pll_p or pll_c. This should be
>     doable in the dts using something like:
> 
>     	assigned-clocks = <&tegra_car TEGRA124_CLK_PWM>;
> 	assigned-clock-parents = <&tegra_car TEGRA124_CLK_PLL_P>;
> 
>     in the pwm node. (Note this includes much guesswork, I don't know the
>     PPL's clk-rate, so this might break in other ways. Another option is
>     using PLL_C.)
> 
> Probably the latter is the nicest option. Is it possible to find out the
> setting when the machine is running the original vendor OS?

The latter does seem correct to me. This fixes the issue for Tegra210 ...

diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
index 4f0e51f1a343..842843e0a585 100644
--- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
@@ -670,6 +670,10 @@
                 clock-names = "pwm";
                 resets = <&tegra_car 17>;
                 reset-names = "pwm";
+
+               assigned-clocks = <&tegra_car TEGRA210_CLK_PWM>;
+               assigned-clock-parents = <&tegra_car TEGRA210_CLK_PLL_P>;
+
                 status = "disabled";
         };

Cheers
Jon

-- 
nvpublic

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-09-22 11:12           ` Jon Hunter
@ 2022-09-23 12:10             ` Thierry Reding
  2022-10-04 10:28               ` Thomas Graichen
  0 siblings, 1 reply; 16+ messages in thread
From: Thierry Reding @ 2022-09-23 12:10 UTC (permalink / raw)
  To: Jon Hunter
  Cc: Uwe Kleine-König, Dmitry Osipenko, Thomas Graichen,
	Dmitry Osipenko, linux-pwm, Maxim Schwalm, Svyatoslav Ryhel,
	kernel, linux-tegra, Lee Jones

[-- Attachment #1: Type: text/plain, Size: 4977 bytes --]

On Thu, Sep 22, 2022 at 12:12:31PM +0100, Jon Hunter wrote:
> Hi Uwe,
> 
> On 21/09/2022 09:17, Uwe Kleine-König wrote:
> 
> ...
> 
> > As the clk-rate is only 32768 Hz we get (with period_ns = 1000000)
> > 
> > 	32768 * 1000000 / (1000000000 << 8) = 0.128
> > 
> > which is indeed rounded down to 0 and then runs into the error path
> > returning -EINVAL. Before my change (that now broke the backlight
> > configuration) configuration continued and then ended with actually
> > configuring period = 7812500 ns which is off by nearly a factor of 8.
> 
> I am seeing the same issue on Tegra210 Jetson Nano (device-tree
> tegra210-p3450-0000.dts). This also has a clock rate of 32768 Hz by
> default which means the min period is 30517ns. However, in the probe
> the min_period_ns comes from the pc->soc->max_frequency which is 48
> MHz for Tegra210. The min_period_ns = 1/(48 MHz / (2^8)) which is
> 5334ns. Hence, the actual min period is less than what is actually
> possible.
> 
> I wonder if we should be warning about this and fixing the min
> period ...
> 
> diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> index 2f3dcb9e9278..f72928c05c81 100644
> --- a/drivers/pwm/pwm-tegra.c
> +++ b/drivers/pwm/pwm-tegra.c
> @@ -310,9 +310,13 @@ static int tegra_pwm_probe(struct platform_device *pdev)
>          */
>         pc->clk_rate = clk_get_rate(pc->clk);
> +       if (pc->clk_rate < pc->soc->max_frequency)
> +               dev_warn(&pdev->dev, "Max frequency limited to %lu Hz!",
> +                        pc->clk_rate);
> +
>         /* Set minimum limit of PWM period for the IP */
>         pc->min_period_ns =
> -           (NSEC_PER_SEC / (pc->soc->max_frequency >> PWM_DUTY_WIDTH)) + 1;
> +           (NSEC_PER_SEC / (pc->clk_rate >> PWM_DUTY_WIDTH)) + 1;
>         pc->rst = devm_reset_control_get_exclusive(&pdev->dev, "pwm");
> 
> The above does not fix this issue but ...
> > I didn't find a device tree for an Asus Transformer tablet bases on a
> > tegra124 in the kernel source, but the options are:
> > 
> >   - Revert commit 8c193f4714df ("pwm: tegra: Optimize period calculation").
> >     I don't like this. IMHO this commit is an improvement and the problem
> >     is that the consumer requests a too small period. For a backlight
> >     this might be ok to result in a much bigger period, for other
> >     usecases it isn't and so I like refusing period = 1000000.
> > 
> >   - We could just drop the "else / return -EINVAL".
> >     This is inconsistent as then (again) some periods are rounded up
> >     (with the given clk rate that would be 5334 <= period < 7812500)
> >     while others (period < 5334) yield -EINVAL.
> > 
> >   - Increase the period that the backlight is using to at least 7812500.
> >     This is done (I guess) by replacing 1000000 by 7812500 (or more) in
> >     the backlight's PWM phandle.
> > 
> >   - Make the PWM clk faster.
> >     Looking quickly through the tegra clk driver, the parent of the PWM
> >     clk could be changed from clk_32k to pll_p or pll_c. This should be
> >     doable in the dts using something like:
> > 
> >     	assigned-clocks = <&tegra_car TEGRA124_CLK_PWM>;
> > 	assigned-clock-parents = <&tegra_car TEGRA124_CLK_PLL_P>;
> > 
> >     in the pwm node. (Note this includes much guesswork, I don't know the
> >     PPL's clk-rate, so this might break in other ways. Another option is
> >     using PLL_C.)
> > 
> > Probably the latter is the nicest option. Is it possible to find out the
> > setting when the machine is running the original vendor OS?
> 
> The latter does seem correct to me. This fixes the issue for Tegra210 ...
> 
> diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> index 4f0e51f1a343..842843e0a585 100644
> --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> @@ -670,6 +670,10 @@
>                 clock-names = "pwm";
>                 resets = <&tegra_car 17>;
>                 reset-names = "pwm";
> +
> +               assigned-clocks = <&tegra_car TEGRA210_CLK_PWM>;
> +               assigned-clock-parents = <&tegra_car TEGRA210_CLK_PLL_P>;
> +
>                 status = "disabled";
>         };

Traditionally we've always set the clock parent in the driver via the
init table (at least on chips prior to Tegra186). On the other hand we
have a few occurrences of assigned-clock-rates already for Tegra210. I
don't feel strongly either way. A minor advantage of having the fix in
the clock driver is that it fixes this automatically for older device
trees. Not that it really matters since people always update kernel and
DTB at the same time on Tegra devices.

It'd be great if we could get confirmation that changing the parent
clock fixes this on all other boards as well, then we can fix it at the
same time for all of them.

Thierry

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-09-23 12:10             ` Thierry Reding
@ 2022-10-04 10:28               ` Thomas Graichen
  2022-10-04 10:41                 ` Uwe Kleine-König
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Graichen @ 2022-10-04 10:28 UTC (permalink / raw)
  To: Thierry Reding
  Cc: Jon Hunter, Uwe Kleine-König, Dmitry Osipenko,
	Dmitry Osipenko, linux-pwm, Maxim Schwalm, Svyatoslav Ryhel,
	kernel, linux-tegra, Lee Jones

On Fri, Sep 23, 2022 at 2:10 PM Thierry Reding <thierry.reding@gmail.com> wrote:
>
> On Thu, Sep 22, 2022 at 12:12:31PM +0100, Jon Hunter wrote:
> > Hi Uwe,
> >
> > On 21/09/2022 09:17, Uwe Kleine-König wrote:
> >
> > ...
> >
> > > As the clk-rate is only 32768 Hz we get (with period_ns = 1000000)
> > >
> > >     32768 * 1000000 / (1000000000 << 8) = 0.128
> > >
> > > which is indeed rounded down to 0 and then runs into the error path
> > > returning -EINVAL. Before my change (that now broke the backlight
> > > configuration) configuration continued and then ended with actually
> > > configuring period = 7812500 ns which is off by nearly a factor of 8.
> >
> > I am seeing the same issue on Tegra210 Jetson Nano (device-tree
> > tegra210-p3450-0000.dts). This also has a clock rate of 32768 Hz by
> > default which means the min period is 30517ns. However, in the probe
> > the min_period_ns comes from the pc->soc->max_frequency which is 48
> > MHz for Tegra210. The min_period_ns = 1/(48 MHz / (2^8)) which is
> > 5334ns. Hence, the actual min period is less than what is actually
> > possible.
> >
> > I wonder if we should be warning about this and fixing the min
> > period ...
> >
> > diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> > index 2f3dcb9e9278..f72928c05c81 100644
> > --- a/drivers/pwm/pwm-tegra.c
> > +++ b/drivers/pwm/pwm-tegra.c
> > @@ -310,9 +310,13 @@ static int tegra_pwm_probe(struct platform_device *pdev)
> >          */
> >         pc->clk_rate = clk_get_rate(pc->clk);
> > +       if (pc->clk_rate < pc->soc->max_frequency)
> > +               dev_warn(&pdev->dev, "Max frequency limited to %lu Hz!",
> > +                        pc->clk_rate);
> > +
> >         /* Set minimum limit of PWM period for the IP */
> >         pc->min_period_ns =
> > -           (NSEC_PER_SEC / (pc->soc->max_frequency >> PWM_DUTY_WIDTH)) + 1;
> > +           (NSEC_PER_SEC / (pc->clk_rate >> PWM_DUTY_WIDTH)) + 1;
> >         pc->rst = devm_reset_control_get_exclusive(&pdev->dev, "pwm");
> >
> > The above does not fix this issue but ...
> > > I didn't find a device tree for an Asus Transformer tablet bases on a
> > > tegra124 in the kernel source, but the options are:
> > >
> > >   - Revert commit 8c193f4714df ("pwm: tegra: Optimize period calculation").
> > >     I don't like this. IMHO this commit is an improvement and the problem
> > >     is that the consumer requests a too small period. For a backlight
> > >     this might be ok to result in a much bigger period, for other
> > >     usecases it isn't and so I like refusing period = 1000000.
> > >
> > >   - We could just drop the "else / return -EINVAL".
> > >     This is inconsistent as then (again) some periods are rounded up
> > >     (with the given clk rate that would be 5334 <= period < 7812500)
> > >     while others (period < 5334) yield -EINVAL.
> > >
> > >   - Increase the period that the backlight is using to at least 7812500.
> > >     This is done (I guess) by replacing 1000000 by 7812500 (or more) in
> > >     the backlight's PWM phandle.
> > >
> > >   - Make the PWM clk faster.
> > >     Looking quickly through the tegra clk driver, the parent of the PWM
> > >     clk could be changed from clk_32k to pll_p or pll_c. This should be
> > >     doable in the dts using something like:
> > >
> > >             assigned-clocks = <&tegra_car TEGRA124_CLK_PWM>;
> > >     assigned-clock-parents = <&tegra_car TEGRA124_CLK_PLL_P>;
> > >
> > >     in the pwm node. (Note this includes much guesswork, I don't know the
> > >     PPL's clk-rate, so this might break in other ways. Another option is
> > >     using PLL_C.)
> > >
> > > Probably the latter is the nicest option. Is it possible to find out the
> > > setting when the machine is running the original vendor OS?
> >
> > The latter does seem correct to me. This fixes the issue for Tegra210 ...
> >
> > diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> > index 4f0e51f1a343..842843e0a585 100644
> > --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> > +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> > @@ -670,6 +670,10 @@
> >                 clock-names = "pwm";
> >                 resets = <&tegra_car 17>;
> >                 reset-names = "pwm";
> > +
> > +               assigned-clocks = <&tegra_car TEGRA210_CLK_PWM>;
> > +               assigned-clock-parents = <&tegra_car TEGRA210_CLK_PLL_P>;
> > +
> >                 status = "disabled";
> >         };
>
> Traditionally we've always set the clock parent in the driver via the
> init table (at least on chips prior to Tegra186). On the other hand we
> have a few occurrences of assigned-clock-rates already for Tegra210. I
> don't feel strongly either way. A minor advantage of having the fix in
> the clock driver is that it fixes this automatically for older device
> trees. Not that it really matters since people always update kernel and
> DTB at the same time on Tegra devices.
>
> It'd be great if we could get confirmation that changing the parent
> clock fixes this on all other boards as well, then we can fix it at the
> same time for all of them.
>
> Thierry

just a little update: i just compiled a v6.0 kernel for the nyan
chromebook (tegra124) and now the display remains completely black -
with v5.19 it was at least working with broken brightness control -
reverting the pwm optimization patch it works perfectly fine in both
cases ... please let me know if you have any patches i should test to
fix this

best wishes - thomas

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-10-04 10:28               ` Thomas Graichen
@ 2022-10-04 10:41                 ` Uwe Kleine-König
  0 siblings, 0 replies; 16+ messages in thread
From: Uwe Kleine-König @ 2022-10-04 10:41 UTC (permalink / raw)
  To: Thomas Graichen
  Cc: Thierry Reding, linux-pwm, kernel, Maxim Schwalm,
	Svyatoslav Ryhel, Jon Hunter, Dmitry Osipenko, linux-tegra,
	Dmitry Osipenko, Lee Jones

[-- Attachment #1: Type: text/plain, Size: 6316 bytes --]

Hello Thomas,

On Tue, Oct 04, 2022 at 12:28:25PM +0200, Thomas Graichen wrote:
> On Fri, Sep 23, 2022 at 2:10 PM Thierry Reding <thierry.reding@gmail.com> wrote:
> >
> > On Thu, Sep 22, 2022 at 12:12:31PM +0100, Jon Hunter wrote:
> > > Hi Uwe,
> > >
> > > On 21/09/2022 09:17, Uwe Kleine-König wrote:
> > >
> > > ...
> > >
> > > > As the clk-rate is only 32768 Hz we get (with period_ns = 1000000)
> > > >
> > > >     32768 * 1000000 / (1000000000 << 8) = 0.128
> > > >
> > > > which is indeed rounded down to 0 and then runs into the error path
> > > > returning -EINVAL. Before my change (that now broke the backlight
> > > > configuration) configuration continued and then ended with actually
> > > > configuring period = 7812500 ns which is off by nearly a factor of 8.
> > >
> > > I am seeing the same issue on Tegra210 Jetson Nano (device-tree
> > > tegra210-p3450-0000.dts). This also has a clock rate of 32768 Hz by
> > > default which means the min period is 30517ns. However, in the probe
> > > the min_period_ns comes from the pc->soc->max_frequency which is 48
> > > MHz for Tegra210. The min_period_ns = 1/(48 MHz / (2^8)) which is
> > > 5334ns. Hence, the actual min period is less than what is actually
> > > possible.
> > >
> > > I wonder if we should be warning about this and fixing the min
> > > period ...
> > >
> > > diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> > > index 2f3dcb9e9278..f72928c05c81 100644
> > > --- a/drivers/pwm/pwm-tegra.c
> > > +++ b/drivers/pwm/pwm-tegra.c
> > > @@ -310,9 +310,13 @@ static int tegra_pwm_probe(struct platform_device *pdev)
> > >          */
> > >         pc->clk_rate = clk_get_rate(pc->clk);
> > > +       if (pc->clk_rate < pc->soc->max_frequency)
> > > +               dev_warn(&pdev->dev, "Max frequency limited to %lu Hz!",
> > > +                        pc->clk_rate);
> > > +
> > >         /* Set minimum limit of PWM period for the IP */
> > >         pc->min_period_ns =
> > > -           (NSEC_PER_SEC / (pc->soc->max_frequency >> PWM_DUTY_WIDTH)) + 1;
> > > +           (NSEC_PER_SEC / (pc->clk_rate >> PWM_DUTY_WIDTH)) + 1;
> > >         pc->rst = devm_reset_control_get_exclusive(&pdev->dev, "pwm");
> > >
> > > The above does not fix this issue but ...
> > > > I didn't find a device tree for an Asus Transformer tablet bases on a
> > > > tegra124 in the kernel source, but the options are:
> > > >
> > > >   - Revert commit 8c193f4714df ("pwm: tegra: Optimize period calculation").
> > > >     I don't like this. IMHO this commit is an improvement and the problem
> > > >     is that the consumer requests a too small period. For a backlight
> > > >     this might be ok to result in a much bigger period, for other
> > > >     usecases it isn't and so I like refusing period = 1000000.
> > > >
> > > >   - We could just drop the "else / return -EINVAL".
> > > >     This is inconsistent as then (again) some periods are rounded up
> > > >     (with the given clk rate that would be 5334 <= period < 7812500)
> > > >     while others (period < 5334) yield -EINVAL.
> > > >
> > > >   - Increase the period that the backlight is using to at least 7812500.
> > > >     This is done (I guess) by replacing 1000000 by 7812500 (or more) in
> > > >     the backlight's PWM phandle.
> > > >
> > > >   - Make the PWM clk faster.
> > > >     Looking quickly through the tegra clk driver, the parent of the PWM
> > > >     clk could be changed from clk_32k to pll_p or pll_c. This should be
> > > >     doable in the dts using something like:
> > > >
> > > >             assigned-clocks = <&tegra_car TEGRA124_CLK_PWM>;
> > > >     assigned-clock-parents = <&tegra_car TEGRA124_CLK_PLL_P>;
> > > >
> > > >     in the pwm node. (Note this includes much guesswork, I don't know the
> > > >     PPL's clk-rate, so this might break in other ways. Another option is
> > > >     using PLL_C.)
> > > >
> > > > Probably the latter is the nicest option. Is it possible to find out the
> > > > setting when the machine is running the original vendor OS?
> > >
> > > The latter does seem correct to me. This fixes the issue for Tegra210 ...
> > >
> > > diff --git a/arch/arm64/boot/dts/nvidia/tegra210.dtsi b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> > > index 4f0e51f1a343..842843e0a585 100644
> > > --- a/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> > > +++ b/arch/arm64/boot/dts/nvidia/tegra210.dtsi
> > > @@ -670,6 +670,10 @@
> > >                 clock-names = "pwm";
> > >                 resets = <&tegra_car 17>;
> > >                 reset-names = "pwm";
> > > +
> > > +               assigned-clocks = <&tegra_car TEGRA210_CLK_PWM>;
> > > +               assigned-clock-parents = <&tegra_car TEGRA210_CLK_PLL_P>;
> > > +
> > >                 status = "disabled";
> > >         };
> >
> > Traditionally we've always set the clock parent in the driver via the
> > init table (at least on chips prior to Tegra186). On the other hand we
> > have a few occurrences of assigned-clock-rates already for Tegra210. I
> > don't feel strongly either way. A minor advantage of having the fix in
> > the clock driver is that it fixes this automatically for older device
> > trees. Not that it really matters since people always update kernel and
> > DTB at the same time on Tegra devices.
> >
> > It'd be great if we could get confirmation that changing the parent
> > clock fixes this on all other boards as well, then we can fix it at the
> > same time for all of them.
> >
> > Thierry
> 
> just a little update: i just compiled a v6.0 kernel for the nyan
> chromebook (tegra124) and now the display remains completely black -
> with v5.19 it was at least working with broken brightness control -
> reverting the pwm optimization patch it works perfectly fine in both
> cases ... please let me know if you have any patches i should test to
> fix this

You might want to test

https://lore.kernel.org/linux-clk/20221003101555.25458-1-jonathanh@nvidia.com

(Hmm, tegra124 != tegra210? Then something similar should be done for
tegra124?)

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-04-25 13:22 [PATCH v2] pwm: tegra: Optimize period calculation Uwe Kleine-König
  2022-05-20 14:20 ` Thierry Reding
  2022-08-15  0:28 ` Dmitry Osipenko
@ 2022-10-25 14:22 ` Jon Hunter
  2022-10-26  0:10   ` Uwe Kleine-König
  2 siblings, 1 reply; 16+ messages in thread
From: Jon Hunter @ 2022-10-25 14:22 UTC (permalink / raw)
  To: Uwe Kleine-König, Thierry Reding, Lee Jones
  Cc: linux-tegra, linux-pwm, kernel

Uwe, Thierry,

On 25/04/2022 14:22, Uwe Kleine-König wrote:
> Dividing by the result of a division looses precision because the result is
> rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
> following numbers result:
> 
> 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
> 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
> 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
> 
> The exact result would be 6142.5061875 and (apart from rounding) this is
> found by using a single division. As a side effect is also a tad
> cheaper to calculate.
> 
> Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
> example clk_rate = 47999999 and period = 106667:
> 
> 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
> 			    NSEC_PER_SEC) = 19
> 
> 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
> 
> (The exact result is 20.000062083332033.)
> 
> With this optimizations also switch from round-closest to round-down for
> the period calculation. Given that the calculations were non-optimal for
> quite some time now with variations in both directions which nobody
> reported as a problem, this is the opportunity to align the driver's
> behavior to the requirements of new drivers. This has several upsides:
> 
>   - Implementation is easier as there are no round-nearest variants of
>     mul_u64_u64_div_u64().
>   - Requests for too small periods are now consistently refused. This was
>     kind of arbitrary before, where period_ns < min_period_ns was
>     refused, but in some cases min_period_ns isn't actually implementable
>     and then values between min_period_ns and the actual minimum were
>     rounded up to the actual minimum.
> 
> Note that the duty_cycle calculation isn't using the usual round-down
> approach yet.
> 
> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> ---
> Hello,
> 
> changes since (implicit) v1: Updated changelog to explain why rate = 0
> is refused now.
> 
> Best regards
> Uwe
> 
>   drivers/pwm/pwm-tegra.c | 10 +++++-----
>   1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> index e5a9ffef4a71..7fc03a9ec154 100644
> --- a/drivers/pwm/pwm-tegra.c
> +++ b/drivers/pwm/pwm-tegra.c
> @@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>   			    int duty_ns, int period_ns)
>   {
>   	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
> -	unsigned long long c = duty_ns, hz;
> +	unsigned long long c = duty_ns;
>   	unsigned long rate, required_clk_rate;
>   	u32 val = 0;
>   	int err;
> @@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>   		pc->clk_rate = clk_get_rate(pc->clk);
>   	}
>   
> -	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
> -
>   	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
> -	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
> -	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
> +	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> +				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
>   
>   	/*
>   	 * Since the actual PWM divider is the register's frequency divider
> @@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
>   	 */
>   	if (rate > 0)
>   		rate--;
> +	else
> +		return -EINVAL;


I am seeing more problems with this ...

1. In the case where we call dev_pm_opp_set_rate() to set the PWM clock
    rate, the requested rate is calculated as ...

   required_clk_rate = (NSEC_PER_SEC / period_ns) << PWM_DUTY_WIDTH;

    For Tegra234, I have a case where the period is 45334 and so the
    above yields a rate of 5646848Hz. In this case, mul_u64_u64_div_u64()
    returns 0 because ...

    (5646848 * 45334)/(NSEC_PER_SEC << PWM_DUTY_WIDTH) = 0.9999

    We can fix this by ...

   required_clk_rate = (NSEC_PER_SEC << PWM_DUTY_WIDTH) / period_ns;

    The above produces a rate of 5646975 vs 5646848Hz.

2. Even after fixing issue #1, above, I then ran into another issue
    where even if I request a clock rate of 5646975 I still get a lower
    clock. This also causes  mul_u64_u64_div_u64() to return 0 and then I
    see the -EINVAL returned. The simple solution here is to not return
    -EINVAL for 0. After all 0, could be valid if the rate if we are
    not dividing down the parent clock in the PWM. However, then there is
    always the case we are not operating at the expected rate. Seems like
    we should be rounding the result.

Thoughts?

Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-10-25 14:22 ` Jon Hunter
@ 2022-10-26  0:10   ` Uwe Kleine-König
  2022-10-26 10:14     ` Jon Hunter
  0 siblings, 1 reply; 16+ messages in thread
From: Uwe Kleine-König @ 2022-10-26  0:10 UTC (permalink / raw)
  To: Jon Hunter; +Cc: Thierry Reding, Lee Jones, linux-tegra, linux-pwm, kernel

[-- Attachment #1: Type: text/plain, Size: 6526 bytes --]

Hello Jon,

On Tue, Oct 25, 2022 at 03:22:18PM +0100, Jon Hunter wrote:
> On 25/04/2022 14:22, Uwe Kleine-König wrote:
> > Dividing by the result of a division looses precision because the result is
> > rounded twice. E.g. with clk_rate = 48000000 and period = 32760033 the
> > following numbers result:
> > 
> > 	rate = pc->clk_rate >> PWM_DUTY_WIDTH = 187500
> > 	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns) = 3052
> > 	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz) = 6144
> > 
> > The exact result would be 6142.5061875 and (apart from rounding) this is
> > found by using a single division. As a side effect is also a tad
> > cheaper to calculate.
> > 
> > Also using clk_rate >> PWM_DUTY_WIDTH looses precision. Consider for
> > example clk_rate = 47999999 and period = 106667:
> > 
> > 	mul_u64_u64_div_u64(pc->clk_rate >> PWM_DUTY_WIDTH, period_ns,
> > 			    NSEC_PER_SEC) = 19
> > 
> > 	mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > 			    NSEC_PER_SEC << PWM_DUTY_WIDTH) = 20
> > 
> > (The exact result is 20.000062083332033.)
> > 
> > With this optimizations also switch from round-closest to round-down for
> > the period calculation. Given that the calculations were non-optimal for
> > quite some time now with variations in both directions which nobody
> > reported as a problem, this is the opportunity to align the driver's
> > behavior to the requirements of new drivers. This has several upsides:
> > 
> >   - Implementation is easier as there are no round-nearest variants of
> >     mul_u64_u64_div_u64().
> >   - Requests for too small periods are now consistently refused. This was
> >     kind of arbitrary before, where period_ns < min_period_ns was
> >     refused, but in some cases min_period_ns isn't actually implementable
> >     and then values between min_period_ns and the actual minimum were
> >     rounded up to the actual minimum.
> > 
> > Note that the duty_cycle calculation isn't using the usual round-down
> > approach yet.
> > 
> > Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
> > ---
> > Hello,
> > 
> > changes since (implicit) v1: Updated changelog to explain why rate = 0
> > is refused now.
> > 
> > Best regards
> > Uwe
> > 
> >   drivers/pwm/pwm-tegra.c | 10 +++++-----
> >   1 file changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c
> > index e5a9ffef4a71..7fc03a9ec154 100644
> > --- a/drivers/pwm/pwm-tegra.c
> > +++ b/drivers/pwm/pwm-tegra.c
> > @@ -99,7 +99,7 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >   			    int duty_ns, int period_ns)
> >   {
> >   	struct tegra_pwm_chip *pc = to_tegra_pwm_chip(chip);
> > -	unsigned long long c = duty_ns, hz;
> > +	unsigned long long c = duty_ns;
> >   	unsigned long rate, required_clk_rate;
> >   	u32 val = 0;
> >   	int err;
> > @@ -156,11 +156,9 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >   		pc->clk_rate = clk_get_rate(pc->clk);
> >   	}
> > -	rate = pc->clk_rate >> PWM_DUTY_WIDTH;
> > -
> >   	/* Consider precision in PWM_SCALE_WIDTH rate calculation */
> > -	hz = DIV_ROUND_CLOSEST_ULL(100ULL * NSEC_PER_SEC, period_ns);
> > -	rate = DIV_ROUND_CLOSEST_ULL(100ULL * rate, hz);
> > +	rate = mul_u64_u64_div_u64(pc->clk_rate, period_ns,
> > +				   (u64)NSEC_PER_SEC << PWM_DUTY_WIDTH);
> >   	/*
> >   	 * Since the actual PWM divider is the register's frequency divider
> > @@ -169,6 +167,8 @@ static int tegra_pwm_config(struct pwm_chip *chip, struct pwm_device *pwm,
> >   	 */
> >   	if (rate > 0)
> >   		rate--;
> > +	else
> > +		return -EINVAL;
> 
> 
> I am seeing more problems with this ...
> 
> 1. In the case where we call dev_pm_opp_set_rate() to set the PWM clock
>    rate, the requested rate is calculated as ...
> 
>   required_clk_rate = (NSEC_PER_SEC / period_ns) << PWM_DUTY_WIDTH;
> 
>    For Tegra234, I have a case where the period is 45334 and so the
>    above yields a rate of 5646848Hz. In this case, mul_u64_u64_div_u64()
> 
>    returns 0 because ...
> 
>    (5646848 * 45334)/(NSEC_PER_SEC << PWM_DUTY_WIDTH) = 0.9999
> 
>    We can fix this by ...
> 
>   required_clk_rate = (NSEC_PER_SEC << PWM_DUTY_WIDTH) / period_ns;
> 
>    The above produces a rate of 5646975 vs 5646848Hz.

You probably also want to round up that division such that you work with
5646976.

> 2. Even after fixing issue #1, above, I then ran into another issue
>    where even if I request a clock rate of 5646975 I still get a lower
>    clock. This also causes  mul_u64_u64_div_u64() to return 0 and then I
>    see the -EINVAL returned. The simple solution here is to not return
>    -EINVAL for 0. After all 0, could be valid if the rate if we are
>    not dividing down the parent clock in the PWM. However, then there is

As you have to subtract 1 from the result of the division, you need to
write a -1 in the register which doesn't work. Writing a 0 results in a
bigger period than requested which is the thing I intended to fix with
the blamed commit.

If clk_rate was 5646976 (don't know if that matches as nicely as it
should? If dev_pm_opp_set_rate might set a lower rate you're out of luck
again) we get:

	rate = mul_u64_u64_div_u64(5646976, 45334, 1000000000 << 8)

which is 1 and everything is fine.

Note that the math before my commit already had that problem. Before the
driver was just more lax and didn't subtract 1 from rate and so
configured a bigger period than requested/expected

There are probably similar cases where the driver still configures a too
big period (i.e. when the effect you described yields a rate that is too
small but bigger than 0).

So the optimal way to fix this is to adapt the calculation of
required_clk_rate as you suggested + rounding up and to make sure that
dev_pm_opp_set_rate doesn't configure a rate lower than requested. The
latter might be complicated as the API (AFAIK) isn't specific about
rounding directions and there isn't a dev_pm_opp_round_rate function.

As a short term work-around dropping the -EINVAL is probably best, I'd
like to have it documented clearly however that continuing in that case
is wrong and yields unexpected settings.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v2] pwm: tegra: Optimize period calculation
  2022-10-26  0:10   ` Uwe Kleine-König
@ 2022-10-26 10:14     ` Jon Hunter
  0 siblings, 0 replies; 16+ messages in thread
From: Jon Hunter @ 2022-10-26 10:14 UTC (permalink / raw)
  To: Uwe Kleine-König
  Cc: Thierry Reding, Lee Jones, linux-tegra, linux-pwm, kernel


On 26/10/2022 01:10, Uwe Kleine-König wrote:

...

>> 2. Even after fixing issue #1, above, I then ran into another issue
>>     where even if I request a clock rate of 5646975 I still get a lower
>>     clock. This also causes  mul_u64_u64_div_u64() to return 0 and then I
>>     see the -EINVAL returned. The simple solution here is to not return
>>     -EINVAL for 0. After all 0, could be valid if the rate if we are
>>     not dividing down the parent clock in the PWM. However, then there is
> 
> As you have to subtract 1 from the result of the division, you need to
> write a -1 in the register which doesn't work. Writing a 0 results in a
> bigger period than requested which is the thing I intended to fix with
> the blamed commit.
> 
> If clk_rate was 5646976 (don't know if that matches as nicely as it
> should? If dev_pm_opp_set_rate might set a lower rate you're out of luck
> again) we get:
> 
> 	rate = mul_u64_u64_div_u64(5646976, 45334, 1000000000 << 8)
> 
> which is 1 and everything is fine.
> 
> Note that the math before my commit already had that problem. Before the
> driver was just more lax and didn't subtract 1 from rate and so
> configured a bigger period than requested/expected
> 
> There are probably similar cases where the driver still configures a too
> big period (i.e. when the effect you described yields a rate that is too
> small but bigger than 0).
> 
> So the optimal way to fix this is to adapt the calculation of
> required_clk_rate as you suggested + rounding up and to make sure that
> dev_pm_opp_set_rate doesn't configure a rate lower than requested. The
> latter might be complicated as the API (AFAIK) isn't specific about
> rounding directions and there isn't a dev_pm_opp_round_rate function.
> 
> As a short term work-around dropping the -EINVAL is probably best, I'd
> like to have it documented clearly however that continuing in that case
> is wrong and yields unexpected settings.

I have done a bit more testing today and sent out a couple fixes. I have 
added you to the review. Let me know what you think.

Jon

-- 
nvpublic

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-10-26 10:14 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-25 13:22 [PATCH v2] pwm: tegra: Optimize period calculation Uwe Kleine-König
2022-05-20 14:20 ` Thierry Reding
2022-08-15  0:28 ` Dmitry Osipenko
2022-08-15  7:09   ` Uwe Kleine-König
2022-08-18  7:54     ` Uwe Kleine-König
2022-08-18 20:34       ` Dmitry Osipenko
2022-09-21  8:17         ` Uwe Kleine-König
2022-09-22 11:12           ` Jon Hunter
2022-09-23 12:10             ` Thierry Reding
2022-10-04 10:28               ` Thomas Graichen
2022-10-04 10:41                 ` Uwe Kleine-König
     [not found]   ` <CAPVz0n1Xy=feSqw7_bvNw17=xVGnk2yhAMFmyfddU128dU+5qQ@mail.gmail.com>
2022-09-21 13:32     ` Uwe Kleine-König
     [not found]       ` <CAPVz0n19V5Lx889GO7wRzuvPAdBeVE8vXsMzQ-6EGyp4DFGD5w@mail.gmail.com>
2022-09-21 17:09         ` Uwe Kleine-König
2022-10-25 14:22 ` Jon Hunter
2022-10-26  0:10   ` Uwe Kleine-König
2022-10-26 10:14     ` Jon Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.