Hey Uwe, On Tue, Apr 11, 2023 at 06:25:54PM +0200, Uwe Kleine-König wrote: > On Tue, Apr 11, 2023 at 02:56:15PM +0100, Conor Dooley wrote: > > On Tue, Apr 11, 2023 at 12:55:47PM +0200, Uwe Kleine-König wrote: > > > On Tue, Apr 11, 2023 at 09:56:34AM +0100, Conor Dooley wrote: > > > > Add a driver that supports the Microchip FPGA "soft" PWM IP core. > > > > +static int mchp_core_pwm_calc_period(const struct pwm_state *state, unsigned long clk_rate, > > > > + u16 *prescale, u16 *period_steps) > > > > +{ > > > > + u64 tmp; > > > > + u32 remainder; > > > > + > > > > + /* > > > > + * Calculate the period cycles and prescale values. > > > > + * The registers are each 8 bits wide & multiplied to compute the period > > > > + * using the formula: > > > > + * (prescale + 1) * (period_steps + 1) > > > > + * period = ------------------------------------- > > > > + * clk_rate > > > > + * so the maximum period that can be generated is 0x10000 times the > > > > + * period of the input clock. > > > > + * However, due to the design of the "hardware", it is not possible to > > > > + * attain a 100% duty cycle if the full range of period_steps is used. > > > > + * Therefore period_steps is restricted to 0xfe and the maximum multiple > > > > + * of the clock period attainable is (0xff + 1) * (0xfe + 1) = 0xff00 > > > > + * > > > > + * The prescale and period_steps registers operate similarly to > > > > + * CLK_DIVIDER_ONE_BASED, where the value used by the hardware is that > > > > + * in the register plus one. > > > > + * It's therefore not possible to set a period lower than 1/clk_rate, so > > > > + * if tmp is 0, abort. Without aborting, we will set a period that is > > > > + * greater than that requested and, more importantly, will trigger the > > > > + * neg-/pos-edge issue described in the limitations. > > > > + */ > > > > + tmp = mul_u64_u64_div_u64(state->period, clk_rate, NSEC_PER_SEC); > > > > + if (!tmp) > > > > + return -EINVAL; > > > > + > > > > + if (tmp >= MCHPCOREPWM_PERIOD_MAX) { > > > > + *prescale = MCHPCOREPWM_PRESCALE_MAX; > > > > + *period_steps = MCHPCOREPWM_PERIOD_STEPS_MAX; > > > > + > > > > + return 0; > > > > + } > > > > + > > > > + /* > > > > + * There are multiple strategies that could be used to choose the > > > > + * prescale & period_steps values. > > > > + * Here the idea is to pick values so that the selection of duty cycles > > > > + * is as finegrain as possible. > > > > + * This "optimal" value for prescale can be calculated using the maximum > > > > + * permitted value of period_steps, 0xfe. > > > > + * > > > > + * period * clk_rate > > > > + * prescale = ------------------------- - 1 > > > > + * NSEC_PER_SEC * (0xfe + 1) > > > > + * > > > > + * However, we are purely interested in the integer upper bound of this > > > > + * calculation, so this division should be rounded up before subtracting > > > > + * 1 > > > > + * > > > > + * period * clk_rate > > > > + * ------------------- was precomputed as `tmp` > > > > + * NSEC_PER_SEC > > > > + */ > > > > + *prescale = DIV64_U64_ROUND_UP(tmp, MCHPCOREPWM_PERIOD_STEPS_MAX + 1) - 1; > > > > > > If state->period * clk_rate is 765000000001 you get tmp = 765 and then > > > *prescale = 2. However roundup(765000000001 / (1000000000 * 255)) - 1 is > > > 3. The problem here is that you're rounding down in the calculation of > > > tmp. Of course this is constructed because 765000000001 is prime, but > > > I'm sure you get the point :-) > > > > Hold that thought for a moment.. > > OK, so the correction below is to make up for the wrong rounding here. > I'd like to have that in a comment. Otherwise it wasn't clear to me. Sure, no problem. > > > Also we know that tmp is < 0xff00, so we don't need a 64 bit division > > > here. > > > > Neither here nor below, true. > > > > > > + /* > > > > + * Because 0xff is not a permitted value some error will seep into the > > > > + * calculation of prescale as prescale grows. Specifically, this error > > > > + * occurs where the remainder of the prescale calculation is less than > > > > + * prescale. > > > > + * For small values of prescale, only a handful of values will need > > > > + * correction, but overall this applies to almost half of the valid > > > > + * values for tmp. > > > > + * > > > > + * To keep the algorithm's decision making consistent, this case is > > > > + * checked for and the simple solution is to, in these cases, > > > > + * decrement prescale and check that the resulting value of period_steps > > > > + * is valid. > > > > + * > > > > + * period_steps can be computed from prescale: > > > > + * period * clk_rate > > > > + * period_steps = ----------------------------- - 1 > > > > + * NSEC_PER_SEC * (prescale + 1) > > > > + * > > > > + */ > > > > + div_u64_rem(tmp, (MCHPCOREPWM_PERIOD_STEPS_MAX + 1), &remainder); > > > > + if (remainder < *prescale) { > > > > + u16 smaller_prescale = *prescale - 1; > > > > + > > > > + *period_steps = div_u64(tmp, smaller_prescale + 1) - 1; > > > > + if (*period_steps < 255) { > > > > + *prescale = smaller_prescale; > > > > + > > > > + return 0; > > > > + } > > > > + } > > > > ...so in your prime case above, we would initially compute a prescale > > value that is too large, and then wind up hitting the test of the > > remainder here, thereby realising that the smaller prescale value is a > > better fit? > > Perhaps that's not an acceptable way to handle the issue though. > > IMHO it is, but the comment explaining needs some improvement. It should > also make clear why rounding cannot lead to prescale - 2 being a > good/better candidate. (I haven't thought it through, maybe needs some > improvement?) I wonder if the computation can find all improvements > given that it only used tmp, but not ->period and clk_rate?! For tmp > 0xff00 it may be off by more than 255, but that isn't possible at this stage as we've already eliminated the possibility above. > > > I don't understand that part. It triggers for tmp = 511. So you prefer > > > > > > prescale = 1 > > > period_steps = 254 > > > > > > yielding period = 510 / clkrate over > > > > > > prescale = 2 > > > period_steps = 170 > > > > > > yielding 513 / clkrate. I wonder why. > > I missed the -= 1 in my example. So it's: > > It triggers for tmp = 511. So you prefer > > prescale = 1 > period_steps = 254 > > yielding period = 510 / clkrate over > > prescale = 2 > period_steps = 169 > > yielding 510 / clkrate. > > Here it's obvious that the former is the better one. But I wonder why > the former isn't found instantly. Wouldn't > > *prescale = tmp / (MCHPCOREPWM_PERIOD_STEPS_MAX + 1) - 1 > > give a better approximation in general? (Of course with an additional > check that *prescale >= 0 then.) > > ... thinking a bit ... yes, I think that's true: > > If you pick *prescale = tmp / (MCHPCOREPWM_PERIOD_STEPS_MAX + 1) - 1, > then for each period_steps value ≤ 254 we have: > > (*prescale + 1) * (period_steps + 1) > ≤ (*prescale + 1) * 255 > ≤ (tmp // (MCHPCOREPWM_PERIOD_STEPS_MAX + 1)) * 255 > ≤ (tmp // 255) * 255 > ≤ (tmp / 255) * 255 > = tmp tmp = 256 *prescale = 256 // (254 + 1) - 1 ≈ 0 *prescale = 256 // (prescale + 1) - 1 = 256 / (0 + 1) - 1 = 255 That's then gonna give us one of the broken configurations from the limitations. tmp = 257 *prescale = 257 // (254 + 1) - 1 ≈ 0 *prescale = 257 // (prescale + 1) - 1 = 257 / (0 + 1) - 1 = 256 = 0 (registers are 8-bit) And so on... I'm quite obviously missing something that you may think is obvious here, but is not immediately clear to me. > and you can use the maximal period_steps = 0xfe. > (Only for tmp < 255 that isn't possible, up to you if you refuse these > or pick a smaller value for period_steps.) > > > Because 513 > 511 & 254 > 170! > > Is the aim not to produce a period that is less than or equal to that > > requested? The aim of this driver is to pick a prescale/period_steps > > combo that satisfies that constraint, while also trying to maximise the > > "finegrainness" of the duty cycle. > > The latter should be stated in a comment above. > > ack. I also wonder if such a change breaks the other assumptions a > consumer might have. I'll spend some more cycles about that in the next > round :-) > > > > Also tmp = 511 is the only value > > > where this triggers. There is a mistake somewhere (maybe on my side). > > > > It should trigger for any value 255 * n < x < 256 * n, no? > > Say for tmp of 767: > > *prescale = DIV64_U64_ROUND_UP(767, 254 + 1) - 1 = DIV64_U64_ROUND_UP(3.00784...) - 1 = 3 > > remainder = 0.00784.. * (254 + 1) = 2 > > > > Am I going nuts? Wouldn't be the first time that I've made a hames of > > things here, there are 16 versions for a reason after all. > > Ah, I had a bogus break statement in my python test code. So it's me > who got it wrong. No worries :) Pending an explanation of your calculation above, I've gone and done the rest of these things. I shan't resend until -rc1 or I get an explanation of your calculation - whichever happens first! Cheers, Conor.