On Thu, Mar 23, 2023 at 10:14:09AM +0100, Uwe Kleine-König wrote:
> On Wed, Mar 22, 2023 at 10:49:40PM +0000, Conor Dooley wrote:
> > On Wed, Mar 22, 2023 at 11:55:36AM +0100, Uwe Kleine-König wrote:
> > > On Mon, Mar 06, 2023 at 09:48:58AM +0000, Conor Dooley wrote:
> > > > Add a driver that supports the Microchip FPGA "soft" PWM IP core.

> > > > +static void mchp_core_pwm_calc_period(const struct pwm_state *state, unsigned long clk_rate,
> > > > +				      u16 *prescale, u16 *period_steps)
> > > > +{
> > > > +	u64 tmp;
> > > > +
> > > > +	/*
> > > > +	 * Calculate the period cycles and prescale values.
> > > > +	 * The registers are each 8 bits wide & multiplied to compute the period
> > > > +	 * using the formula:
> > > > +	 *           (prescale + 1) * (period_steps + 1)
> > > > +	 * period = -------------------------------------
> > > > +	 *                      clk_rate
> > > > +	 * so the maximum period that can be generated is 0x10000 times the
> > > > +	 * period of the input clock.
> > > > +	 * However, due to the design of the "hardware", it is not possible to
> > > > +	 * attain a 100% duty cycle if the full range of period_steps is used.
> > > > +	 * Therefore period_steps is restricted to 0xFE and the maximum multiple
> > > > +	 * of the clock period attainable is 0xFF00.
> > > > +	 */
> > > > +	tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC);
> > > > +
> > > > +	/*
> > > > +	 * The hardware adds one to the register value, so decrement by one to
> > > > +	 * account for the offset
> > > > +	 */
> > > > +	if (tmp >= MCHPCOREPWM_PERIOD_MAX) {
> > > > +		*prescale = MCHPCOREPWM_PRESCALE_MAX - 1;
> > > > +		*period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX - 1;
> > > > +
> > > > +		return;
> > > > +	}
> > > > +
> > > > +	/*
> > > > +	 * The optimal value for prescale can be calculated using the maximum
> > > > +	 * permitted value of period_steps, 0xff.
> > > 
> > > I had to think about that one for a while. The maximal value for
> > > (period_steps + 1) is 0xff with the reasoning above?! That's also what
> > > the code uses.
> > 
> > I've become really disenfranchised with these register/variable names.
> > I feel like just changing them to disconnect the variables used for
> > calculation from the register names a little, so that the "is there a +1
> > needed here or not" stuff becomes a little clearer.
> 
> Full ack, I considered asking for that, but after some time I was in the
> "I reviewed the patch today"-mode (which is quite similar to the mode
> you described :-) and forgot. (Even in that mode the PREG_TO_VAL macro
> annoyed me a bit.)
> 
> > It always makes sense to be when I am in an "I respun the patch today"
> > mode, but by the time we're in the review stage I get muddled.
> > God forbid I have to look at this in 10 years time.
> > 
> > That said, there is a bit of a mistake here. The comment two above says
> > "Therefore period_steps is restricted to 0xFE" when I'm capping things
> > off. Some inaccuracies have probably snuck in during the various
> > respins, and I think the comment above is "correct" but misleading, as
> > it muddies the waters about variable versus register names.
> 
> I think it's sensible to only talk about either the register values or
> the factors. I tend to think that talking about the register values is
> easier at the end and recommend not to hide the +1 (or -1) in a macro.

Yeah, I think the macro had value about 14 versions ago, but the number
of users has dropped over the revisions.
I think what I am going to to do for the next version is drop that
macro, and only ever hold the registers values in variables.
That had the advantage of making the maths in get_state() better match
the comments that are now quite prevalent in the driver, as the +1s done
there are more obvious.
I'm loathe to link a git tree but, without changes to the period
calculation logic, this is what it looks like w/ a consistent approach
to what period_steps and prescale mean:
https://git.kernel.org/pub/scm/linux/kernel/git/conor/linux.git/tree/drivers/pwm/pwm-microchip-core.c?h=pwm-dev-v15
[I blindly pushed that before leaving work & without even building it, so
there's probably some silly mistake in it, but that's besides the point
of displaying variable/comment changes]

From the chopped out bits of the previous email:
> Consider
> 
> 	clk_rate = 1000000
> 	period = 64009000
> 
> then your code gives:
> 
>               period * clk_rate
> 	tmp = ----------------- = 64009
>                 NSEC_PER_SEC
> 
> and so *prescale = 251 and *period_steps = 253. 
> 
> Wasn't the intention to pick *prescale = 250 and then
> *period_steps = 255?
> 
> Depending on your semantics of "optimal", either (252, 252) or (250,
> 255) is better than (251, 253). I think that means you shouldn't ignore
> the -1?

So, putting this one aside because it merits more thinking about.
I went through and checked some arbitrary values of tmp, rather than
dealing with "real" ones computed from frequencies I know are easily
available for me to use in the FPGA bitstream I use to test this stuff.

I think my "integer maths" truncation approach is not actually valid.
Consider tmp = 255... *prescale gets computed as 255/(254 + 1) = 1, per my
algorithm. Then, *period_steps is 255/(1 + 1) - 1 = 127.
The period is (1 + 1)(127 + 1), which is not 255.

Or take tmp = 510. prescale is 510/(254 + 1) = 2. period_steps is then
510/(2 + 1) - 1 = 169. period is (2 + 1)(169 + 1), which is 510. The
calculated period is right, but that is not the "optimal" value!

I think you're right in that I do actually need to consider the -1, and
do a ceiling operation, when calculating prescale, IOW:
	*prescale = ceil(div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX + 1)) - 1;
	*period_steps = div_u64(tmp, *prescale + 1) - 1;
	[I know I can't actually call ceil()]

That'd do the same thing as the truncation where
div_u64(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX + 1) is not a round number,
but it improves the round number case, eg tmp = 510:
prescale = 510/(254 + 1) - 1 = 1, period_steps = 510/(1 + 1) - 1 = 254
period = (1 + 1)(254 + 1) = 510

It does mean a zero period would need to be special cased, but I don't
mind that.

> One thing I think is strange is that with clk_rate = 1000001 and your
> algorithm we get:
> 
> requested period = 1274998 ns -> real period = 1269998.73000127  (prescale = 4, period_steps = 253)
> requested period = 1274999 ns -> real period = 1271998.728001272 (prescale = 5, period_steps = 211)

This second one here, where tmp = 1275, is a good example of the above.
With the ceil() change, this would be prescale = 4, period_steps = 254
which I think makes more sense.
> 
> while 1271998.728001272 would be a better match for a request with
> period = 1274998 than 1269998.73000127.
> 
> I spend too much time to think about that now. I'm unsure if this is
> because the -1 is missing, or if there is a bug in the idea to pick a
> small prescale to allow a big period_steps value (in combination with
> the request to pick the biggest possible period).

With the inconsistency fixed, I think getting the slightly less accurate
period is a byproduct of prioritising the finegrainedness of the duty
cycle.

> Hmm, maybe you understand that better than me? I'll have to think about
> it.

[end of snip from the last mail]

> 
> Having said that here are my results of thinking a bit about how to
> choose register values:
> 
> Let r(p) denote the period that is actually configured if p is
> requested.
> 
> The nice properties we want (i.e. those a consumer might expect?) are:
> 
>  a) ∀p: r(p) ≤ p
>     i.e. never configure a bigger period than requested
>     (This is a bit arbitrary, but nice to get a consistent behaviour for
>     all drivers and easier to handle than requesting the nearest
>     possible period.)
> 
>  b) ∀p, q: p ≤ q ⟹ r(p) ≤ r(q)
>     i.e. monotonicity
> 
>  c) ∀p: r(roundup(r(p))) = r(p)
>     i.e. idempotency
> 
>  d) ∀p, q: r(p) ≤ q ⟹ r(p) ≤ r(q)
>     i.e. pick the biggest possible period
> 
> (Sidenote: d) and a) imply b) and c))
> 
> If you always set period_steps to 0xfe
> (in Python syntax:
> 
> 	tmp = p * clkrate // NSPS
> 	period_steps = 0xfe
> 	prescale = tmp // (period_steps + 1) - 1
> 
> ) you get this consistent behaviour.
> 
> This has the upside of being easy to implement and cheap to run.
> Downside is that it's coarse and fails to implement periods that would
> require e.g prescale = 0 and period_steps < 0xfe. As period_steps is
> big, the domain to chose the duty cycle from is good.

I want to maintain support for prescale = 0, so I'm not really
interested in a computation that forsakes that.

> If you pick period_steps and prescale such that
> 	(period_steps + 1) * (prescale + 1)
> is the maximal value that makes r(p) ≤ p you have an increased runtime
> because you have to test multiple values for period_steps. I don't think
> there is an algorithm without a loop to determine an optimal pair?!
> Usually this gives a better match that the easy algorithm above for the
> period, but the domain for duty_cycle is (usually) smaller.
> I think we have a) and d) here, so that's good.
> 
> I think for most consumers a big domain for duty_cycle is more important
> that a good match for the requested period. So I tend to recommend the
> easy algorithm, but I'm not religious about that and open for other
> opinion and reasoning.

I'll be honest and say that I am getting a bit fatigued with the way
that issues w/ the calculations keep cropping up. I'll put a bit of time
into trying to figure out how to fix the tmp = 6400900 case that you
mentioned above, but if it comes out in the wash that that is a facet of
the way this stuff is computed, then I might be inclined to forge ahead
without resolving it... I'd certainly want to explain in a comment
somewhere (limitations section?) the point at which it starts getting
less accurate though. I'll write a script to iterate through the values
of tmp & see if the reason is obvious.

I would like to keep (something resembling) the current simple
implementation of the period calculation, as simplicity is a significant
advantage! I do also like the strategy of trying to maximise the number
of available duty cycle steps, I think it makes perfect sense to make
that the goal.

Thanks again for spending time on this Uwe,
Conor.