On Tue, Jun 06, 2023 at 10:40:34PM +0200, Frank Oltmanns wrote: > Hi Maxime, > > On 2023-06-06 at 16:02:58 +0200, Maxime Ripard wrote: > > [[PGP Signed Part:Undecided]] > > On Mon, Jun 05, 2023 at 12:31:41PM +0200, Frank Oltmanns wrote: > >> Hi Maxime, > >> > >> On 2023-06-02 at 09:31:59 +0200, Maxime Ripard wrote: > >> > [[PGP Signed Part:Undecided]] > >> > Hi, > >> > > >> > On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote: > >> >> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard wrote: > >> >> > [[PGP Signed Part:Undecided]] > >> >> > Hi Frank, > >> >> > > >> >> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote: > >> >> >> I would like to bring your attention to the current process of setting > >> >> >> the rate of an NKM clock. As it stands, when setting the rate of an > >> >> >> NKM clock, the rate nearest but less than or equal to the requested > >> >> >> rate is found, instead of the nearest rate. > >> >> > > >> >> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags() > >> >> > for example. Some devices require that we don't overshoot, while some > >> >> > prefer to have the closest rate. > >> >> > > >> >> > Both are fine, and it's a bit context specific which one we should > >> >> > favour. If we were to do anything, it would be to support both and let > >> >> > the clock driver select which behaviour it wants. > >> >> > > >> >> > >> >> Ok, understood. Thank you for the explanation! Now, I'm wondering if > >> >> anyone would be using such a flag, if I added it. > >> > > >> > I guess that's another thing :) If no-one is going to use it, why should > >> > we do it in the first place? > >> > > >> > But most likely the display and audio clocks are usually fairly ok with > >> > overshooting a bit, and a closest rate is usually better. > >> > >> Ok, I dived a bit deeper into this, but, as far as I can tell, the > >> closest rate is not used anywhere in the sunxi-ng ccu driver. So, when > >> extending, e.g., the NM or NKM clock to support, one must also extend at > >> least the mux clocks, because they expect rates less than the requested > >> rate. That seems to be quite the undertaking for only a small gain in > >> precision. > > > > mux clocks are using __clk_mux_determine_rate which should have the > > behaviour you want when CLK_MUX_ROUND_CLOSEST is set. > > https://elixir.bootlin.com/linux/v6.3.6/source/drivers/clk/sunxi-ng/ccu-sun50i-a64.c#L539 > is one example of a mux clock combined with a divider that is a bit more > complex. I didn't look too deeply, but it seemed to me, that it would > require two separate flags: One for the mux component and one for the > div component. Maybe I'm mistaken, but it seems to me that the concept > of having selected rates always be equal to or less than requested > rates, seems to be deeply ingrained in the sunxi-ng driver. I'm afraid > that I might miss some parts, therefore I abandoned that idea for now > (especially since I have only one board for testing). All the implementations (afaik) of the "composite" clocks we have that combine a mux and dividers eventually use ccu_mux_helper_determine_rate and divider_round_rate_parent. The last one can use CLK_DIVIDER_ROUND_CLOSEST, and you covered the first one in your patch. Which one did you leave out? > >> >> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1]) > >> >> >> when setting a rate, each time iterating over all combinations of n, > >> >> >> k, and m. > >> >> > > >> >> > Yeah, that's expected as well. > >> >> > >> >> I'm wondering though, if iterating over all combinations is set in > >> >> stone, or if some kind of optimization would be in order. > >> > > >> > The thing with optimization is that you need to optimize for something. > >> > So you need to show that this code is suboptimal (by whatever metric you > >> > want to optimize for), and that your code is more optimal that it used > >> > to be. > >> > > >> > It means identifying a problem, writing benchmarks, and showing that the > >> > new code performs better there. > >> > > >> > For example, your code might very well be faster, but it will increase > >> > the kernel image (and thus the RAM usage). One is not more optimal than > >> > the other in absolute, they both are, using a different metric. > >> > >> Sure, I get that. I'll submit a patchset that adds the functionality to > >> NKM clocks to set the rate of their parents. > >> > >> With the new patchset, the time for, e.g. setting DCLK increases from > >> ~0.5 ms to a whopping 30 - 37 ms. Those times were taken > >> unscientifically on my pinephone, i.e. kernel logging and a couple of > >> re-boots. But I think that still might give an idea of why I thought > >> about the need to increase performance. > >> > >> The reason for this massive increase is, that the patch iterates over > >> all combinations of NKM for pll-mipi, and for each combination it > >> iterates over all combinations of NM for pll-video0. > >> > >> Nevertheless, following your and Jernej's advice, I'll submit the > >> patchset first and then we can discuss if speed optimizations are needed > >> and what cost is acceptable. > > > > Honestly, for 40ms, it will be a hard sell :) > > I'm not sure what part you think is the "hard-sell": > a. the patch itself because 30 to 40 ms is way too much > b. the optimization, because 30 to 40 ms isn't all that much. > I honestly don't know. > > BTW, this is the patchset in case you missed it: > https://lore.kernel.org/lkml/20230605190745.366882-1-frank@oltmanns.dev/ > > Note, that I have a patch in the works, which is similar to the one in > this thread, but for ccu_nm. Doing a binary search for finding the > parent rate of pll-mipi, i.e., pll-video0, reduces the time from ~30 ms > to less than 2 ms. If combined with only iterating through meaningful > nkm combinations for pll-mipi, this should bring the time under 1 ms > again. My point is that: * it's regression prone, so quite risky * but if the endgoal is to gain 40ms once, at boot, without any use case to back that up, there's basically no reward. a high risk, low reward patch is what's hard to sell. Maxime