Hello Baruch,

On Fri, Jul 16, 2021 at 08:51:20AM +0300, Baruch Siach wrote:
> On Wed, Jul 14 2021, Uwe Kleine-König wrote:
> > On Tue, Jul 13, 2021 at 02:35:43PM +0300, Baruch Siach wrote:
> >> +/* The frequency range supported is 1Hz to 100MHz */
> >
> > A space between number and unit is usual and makes this better readable.
> 
> Quick 'git grep' indicates that '[[:digit:]]\+MHz' is a little more
> popular than '[[:digit:]]\+ MHz' in kernel code. But OK, not a big deal.

"usual" was not meant in the sense "How it is used in the kernel" but
what the typesetting rules say. (Not 100% sure about English, but in
German you're supposed to add a space.)

> >> +#define IPQ_PWM_CLK_SRC_FREQ	(100*1000*1000)
> >> +#define IPQ_PWM_MIN_PERIOD_NS	(NSEC_PER_SEC / IPQ_PWM_CLK_SRC_FREQ)
> >
> > You're assuming here that the parent clock runs at exactly the set rate.
> > Is this a sensible assumption? If this division didn't have an integer
> > result there would be rounding issues.
> 
> The code only uses this for period validity check. It saves us some code
> for run-time division.

This check is only completely right if the clock really runs at 100 MHz,
and I'd prefer correct over saving a division. (If you know the clock
will run at 100 MHz for sure, you can better hard code it everywhere
giving the compiler the opportunity to optimize.) So the TL;DR here is:
use one or the other and use that one consistently.

> >> +	unsigned int val;
> >> +
> >> +	regmap_read(ipq_chip->regmap, off, &val);
> >> +
> >> +	return val;
> >> +}
> >> +
> >> +static void ipq_pwm_reg_write(struct pwm_device *pwm, unsigned reg,
> >> +		unsigned val)
> >> +{
> >> +	struct ipq_pwm_chip *ipq_chip = to_ipq_pwm_chip(pwm->chip);
> >> +	unsigned int off = ipq_chip->regmap_off + ipq_pwm_reg_offset(pwm, reg);
> >> +
> >> +	regmap_write(ipq_chip->regmap, off, val);
> >> +}
> >> +
> >> +static void config_div_and_duty(struct pwm_device *pwm, unsigned int pre_div,
> >> +			unsigned int pwm_div, u64 period_ns, u64 duty_ns,
> >> +			bool enable)
> >> +{
> >> +	unsigned long hi_dur;
> >> +	unsigned long long quotient;
> >> +	unsigned long val = 0;
> >> +
> >> +	/*
> >> +	 * high duration = pwm duty * (pwm div + 1)
> >> +	 * pwm duty = duty_ns / period_ns
> >> +	 */
> >> +	quotient = (pwm_div + 1) * duty_ns;
> >> +	hi_dur = div64_u64(quotient, period_ns);
> >
> > this division should use the actual period, not the target period.
> > Otherwise the result might be to small.

I just noticed: Using the period here is also bad for precision as the
actual period is the result of a division.

> >> +	val = FIELD_PREP(IPQ_PWM_REG0_HI_DURATION, hi_dur) |
> >> +		FIELD_PREP(IPQ_PWM_REG0_PWM_DIV, pwm_div);
> >> +	ipq_pwm_reg_write(pwm, IPQ_PWM_CFG_REG0, val);
> >> +
> >> +	val = FIELD_PREP(IPQ_PWM_REG1_PRE_DIV, pre_div);
> >> +	ipq_pwm_reg_write(pwm, IPQ_PWM_CFG_REG1, val);
> >> +
> >> +	/* Enable needs a separate write to REG1 */
> >> +	val |= IPQ_PWM_REG1_UPDATE;
> >
> > Setting this bit results in the two writes above being configured
> > atomically so that no mixed settings happen to the output, right?
> 
> I guess so. I have no access to hardware documentation, mind you. I
> first tried to do only one write to REG1, but it had no effect. The
> existence of the UPDATE bit also indicates that hardware works as you
> suggest.

I wouldn't trust HW documentation here. If you have some means to
inspect the waveform this is easy to test. Depending on how long you can
make the periods an LED is enough. If you start with a slower parent
clk, a big pre_div and hi_dur = 0 the LED is supposed to be off. Then
set hi_dur = pwm_div/2 which either make the LED blink slowly or keeps
off. Then setting pre_div = 2 either increased the blink frequency or it
doesn't. ...

> > Does the hardware complete the currently running cycle on
> > reconfiguration?
> 
> No idea.

This is easy to test, too. If you set a big period and duty_cycle and
immediately after that set a small period and duty.

> >> +	if (enable)
> >> +		val |= IPQ_PWM_REG1_ENABLE;
> >> +	else
> >> +		val &= ~IPQ_PWM_REG1_ENABLE;
> >
> > The else branch has no effect as val is initialized as zero above, so
> > please drop it.
> >
> >> +	ipq_pwm_reg_write(pwm, IPQ_PWM_CFG_REG1, val);
> >
> > How does the hardware behave with the ENABLE bit unset? Does it drive
> > the pin to zero?
> 
> Yes. That's what experimentation here shows. The pin is pulled up, but
> the PWM keeps it low.

And with polarity set to inverted the PWM pulls the line up? As the
different hardwares behave differently and some consumers have
expectations here, having this documented would be great.

> >> +static int ipq_pwm_apply(struct pwm_chip *chip, struct pwm_device *pwm,
> >> +			 const struct pwm_state *state)
> >> +{
> >> +	struct ipq_pwm_chip *ipq_chip = to_ipq_pwm_chip(chip);
> >> +	unsigned long freq;
> >> +	unsigned int pre_div, pwm_div, close_pre_div, close_pwm_div;
> >> +	long long diff;
> >> +	unsigned long rate = clk_get_rate(ipq_chip->clk);
> >> +	unsigned long min_diff = rate;
> >> +	uint64_t fin_ps;
> >> +	u64 period_ns, duty_ns;
> >
> > You have to refuse the request if state->polarity !=
> > PWM_POLARITY_NORMAL.
> >
> >> +
> >> +	if (state->period < IPQ_PWM_MIN_PERIOD_NS)
> >
> > It's strange that you assume here the hardcoded 100 MHz but below you
> > use clk_get_rate(ipq_chip->clk).
> 
> As I said above, this is meant to save code for the less critical
> case. Should I use clk_get_rate() here as well? If we go with
> assigned-clock-rates, as you suggest below, we'll have to do that
> anyway.

Sounds right. (That is: use assigned-clock-rates + use clk_get_rate
consistently)
 
> >> +		return -ERANGE;
> >> +
> >> +	period_ns = min(state->period, IPQ_PWM_MAX_PERIOD_NS);
> >> +	duty_ns = min(state->duty_cycle, period_ns);
> >> +
> >> +	/* freq in Hz for period in nano second */
> >> +	freq = div64_u64(NSEC_PER_SEC, period_ns);
> >> +	fin_ps = div64_u64(NSEC_PER_SEC * 1000ULL, rate);
> >
> > I don't understand that factor 1000. This just cancels with the 1000 in
> > the calculation of pwm_div below?! Maybe this is to soften the precision
> > loss?
> 
> That is my understanding of the code intent.
> 
> >> +	close_pre_div = IPQ_PWM_MAX_DIV;
> >> +	close_pwm_div = IPQ_PWM_MAX_DIV;
> >> +
> >> +	for (pre_div = 0; pre_div <= IPQ_PWM_MAX_DIV; pre_div++) {
> >> +		pwm_div = DIV64_U64_ROUND_CLOSEST(period_ns * 1000,
> >> +						  fin_ps * (pre_div + 1));
> >
> > Having fin_ps in the divisor results in loss of precision. When ever the
> > closest rounding division rounds down diff becomes negative below. So
> > you should round up here.
> >
> > Also if you do:
> >
> > 	pwm_div = round_up((period_ns * rate) / (NSEC_PER_SEC * (pre_div + 1)))
> >
> > there is no relevant loss of precision. (You might have to care for
> > period_ns * rate overflowing though or argue why it doesn't overflow.)
> 
> Looks better.

And doesn't need the factor 1000 to improve precision \o/

> >> +		pwm_div--;
> >> +		if (pwm_div > IPQ_PWM_MAX_DIV)
> >> +			continue;
> >
> > This check can be dropped if the loop (depending on the other parameters)
> > does not start with pre_div = 0 but some bigger number.
> 
> That is, calculate the minimum pre_div value for which the division
> above always produces pwm_div in range, right?

Yes, that was my idea. I didn't do the math but expect this not to be so
difficult.

> [...]
> >> +static void ipq_pwm_get_state(struct pwm_chip *chip, struct pwm_device *pwm,
> >> +			      struct pwm_state *state)
> >> +{
> >> +	struct ipq_pwm_chip *ipq_chip = to_ipq_pwm_chip(chip);
> >> +	unsigned long rate = clk_get_rate(ipq_chip->clk);
> >> +	unsigned int pre_div, pwm_div, hi_dur;
> >> +	u64 effective_div, hi_div;
> >> +	u32 reg0, reg1;
> >> +
> >> +	reg0 = ipq_pwm_reg_read(pwm, IPQ_PWM_CFG_REG0);
> >> +	reg1 = ipq_pwm_reg_read(pwm, IPQ_PWM_CFG_REG1);
> >> +
> >> +	state->polarity = PWM_POLARITY_NORMAL;
> >> +	state->enabled = reg1 & IPQ_PWM_REG1_ENABLE;
> >> +
> >> +	pwm_div = FIELD_GET(IPQ_PWM_REG0_PWM_DIV, reg0);
> >> +	hi_dur = FIELD_GET(IPQ_PWM_REG0_HI_DURATION, reg0);
> >> +	pre_div = FIELD_GET(IPQ_PWM_REG1_PRE_DIV, reg1);
> >> +	effective_div = (pre_div + 1) * (pwm_div + 1);
> >
> > Please add a comment here that with pre_div and pwm_div <= 0xffff the
> > multiplication below doesn't overflow
> >
> >> +	state->period = div64_u64(effective_div * NSEC_PER_SEC, rate);
> >> +
> >> +	hi_div = hi_dur * (pre_div + 1);
> >
> > This suggests that the hardware cannot do 100% relative duty cycle if
> > pwm_div == 0xffff? I suggest to clamp pwm_div to 0xfffe then.
> 
> What is "100% relative duty"? How does pwm_div clamping helps?

relative duty = duty_cycle / period. So 100% relative duty means period ==
duty_cycle. With pwm_div == 0xffff period is
0x10000 * (pre_div + 1) / rate but duty_cycle cannot achieve that as the
maximum is 0xffff * (pre_div + 1) / rate.

> >> +	pwm->clk = devm_clk_get(dev, "core");
> >> +	if (IS_ERR(pwm->clk))
> >> +		return dev_err_probe(dev, PTR_ERR(pwm->clk),
> >> +				"failed to get core clock");
> >> +
> >> +	ret = clk_set_rate(pwm->clk, IPQ_PWM_CLK_SRC_FREQ);
> >> +	if (ret)
> >> +		return dev_err_probe(dev, ret, "clock rate set failed");
> >
> > Would it make more sense to set this in the device tree using
> > assigned-clock-rate?
> 
> That's 'assigned-clock-rates' I believe. I'll try that.

Ah right, I missed the s.

Best regards
Uwe

-- 
Pengutronix e.K.                           | Uwe Kleine-König            |
Industrial Linux Solutions                 | https://www.pengutronix.de/ |