linux-sunxi.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
@ 2023-05-27 13:27 Frank Oltmanns
  2023-05-27 13:27 ` [RFC PATCH 1/3] clk: sunxi-ng: nkm: Minimize difference when finding rate Frank Oltmanns
                   ` (3 more replies)
  0 siblings, 4 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-05-27 13:27 UTC (permalink / raw)
  To: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi
  Cc: Frank Oltmanns, Andre Przywara, Chen-Yu Tsai, Icenowy Zheng,
	Jernej Skrabec, Maxime Ripard, Michael Turquette, Rob Herring,
	Samuel Holland, Stephen Boyd

I would like to bring your attention to the current process of setting the rate
of an NKM clock. As it stands, when setting the rate of an NKM clock, the rate
nearest but less than or equal to the requested rate is found, instead of the
nearest rate. Moreover, ccu_nkm_find_best() is called multiple times (footnote
[1]) when setting a rate, each time iterating over all combinations of n, k, and
m.

In response to this, I propose the following refinements to optimize the NKM
clock setting:
 a. when finding the best rate use the nearest rate, even if it is greater than
    the requested rate (PATCH 1)
 b. utilize binary search to find the best rate by going through a
    precalculated, ordered list of all meaningful combinations of n, k, and m
    (PATCH 2)

To illustrate, consider an NKM clock with min_n = 1, max_n = 16, min_k = 2,
max_k = 4, min_m = 1, and max_m = 16. With the current approach, we have to go
through 1024 combinations, of which only 275 are actually meaningful (the
remaining 749 are combinations that result in the same frequency as other
combinations). So, when selecting from these sorted 275 combinations we find the
closest rate after 9 instead of 1024 steps.

As an example, I calculated the table off-line for the pll-mipi clock of
sun50i-a64 (PATCH 3). However, I have identified two other potential strategies:
 2. calculate before first use and persist for subsequent uses (i.e. never free,
    see footnote [2])
 3. calculate before first use and free after setting the rate.

Each approach carries its own merits. The one I implemented is the most
efficient in terms of computation time but consumes the most memory. The second
saves compute time after the initial use, while the third minimizes memory usage
at the cost of additional computation.

The motivation for these proposed changes lies in the current behavior of rate
selection for NKM clocks, which doesn't observe the CLK_SET_RATE_PARENT flag.
I.e. it does not select a different rate for the parent clock to find the
optimal rate. I believe this is likely due to the fact that selecting a new rate
is quite compute intense, as it would involve iterating or calculating rates for
the parent clock and then testing each rate with different n, k, and m
combinations.

As an example, if the parent is an NM clock, we would have to work through the
combinations of the parent's factors (the parent's n) and divisor (the parent's
m). This results in five nested loops to evaluate all possible rates, an effort
that escalates if the parent could also influence the grandparent's rate. In my
example case (sun50i-a64) the pll-mipi's parent (pll-video0) has 2048
combinations of n and m, of which 1266 are meaningful because the others result
in the same frequency for pll-video0.

If we can come up with a way to iterate over the possible rates of a parent,
this would eventually allow us to make NKM clocks obey the CLK_SET_RATE_PARENT
flag. Because it would only require 11,349 (9 * 1,266) steps instead of
2,097,152 (1,024 * 2,048).

Things I considered but don't have a clear answer to:
 - Is there a specific reason, why currently only clock rates less than the
   requested rate are considered when setting a new rate?
 - Do you think it is worth the memory and increased complexity to be able to
   change the parent's clock rate?

I look forward to hearing your thoughts on these proposed changes. Thank you for
your time and consideration.

Footnotes:
[1] Multiple times because ccu_nkm_find_best is (indirectly) called from clk.c
in
 - clk_core_req_round_rate_nolock()
 - clk_calc_new_rates() (which in turn is called three times for reasons that
   currently elude me)
 - clk_change_rate

[2] Actually, we could free the memory in a new ccu_nkm_terminate() function,
which could become part of ccu_nkm_ops. But if my code searching skills don't
betray me, there is currently no other clock that implements the terminate
function.

Frank Oltmanns (3):
  clk: sunxi-ng: nkm: Minimize difference when finding rate
  clk: sunxi-ng: Implement precalculated NKM rate selection
  clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-mipi

 drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 281 ++++++++++++++++++++++++++
 drivers/clk/sunxi-ng/ccu_mux.c        |   2 +-
 drivers/clk/sunxi-ng/ccu_nkm.c        |  97 +++++++--
 drivers/clk/sunxi-ng/ccu_nkm.h        |  26 +++
 4 files changed, 385 insertions(+), 21 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH 1/3] clk: sunxi-ng: nkm: Minimize difference when finding rate
  2023-05-27 13:27 [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Frank Oltmanns
@ 2023-05-27 13:27 ` Frank Oltmanns
  2023-05-27 13:27 ` [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection Frank Oltmanns
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-05-27 13:27 UTC (permalink / raw)
  To: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi
  Cc: Frank Oltmanns, Andre Przywara, Chen-Yu Tsai, Icenowy Zheng,
	Jernej Skrabec, Maxime Ripard, Michael Turquette, Rob Herring,
	Samuel Holland, Stephen Boyd

Instead of selecting the first rate that is less than the requested
rate, select the rate that minimizes the absolute difference with the
requested rate. This ensures a more accurate rate selection, when the
closest available rate is higher than the requested rate.

Also stop searching once an exact match is found.

Adjust mux to this change in rate selection.

Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
---
 drivers/clk/sunxi-ng/ccu_mux.c |  2 +-
 drivers/clk/sunxi-ng/ccu_nkm.c | 13 +++++++++----
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/drivers/clk/sunxi-ng/ccu_mux.c b/drivers/clk/sunxi-ng/ccu_mux.c
index 9303e1f02ffd..6fe0a8d098f1 100644
--- a/drivers/clk/sunxi-ng/ccu_mux.c
+++ b/drivers/clk/sunxi-ng/ccu_mux.c
@@ -145,7 +145,7 @@ int ccu_mux_helper_determine_rate(struct ccu_common *common,
 			goto out;
 		}
 
-		if ((req->rate - tmp_rate) < (req->rate - best_rate)) {
+		if (abs(req->rate - tmp_rate) < abs(req->rate - best_rate)) {
 			best_rate = tmp_rate;
 			best_parent_rate = parent_rate;
 			best_parent = parent;
diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
index a0978a50edae..94d2a83992b2 100644
--- a/drivers/clk/sunxi-ng/ccu_nkm.c
+++ b/drivers/clk/sunxi-ng/ccu_nkm.c
@@ -19,7 +19,7 @@ struct _ccu_nkm {
 static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
 				       struct _ccu_nkm *nkm)
 {
-	unsigned long best_rate = 0;
+	unsigned long best_rate = 0, best_diff = ULONG_MAX;
 	unsigned long best_n = 0, best_k = 0, best_m = 0;
 	unsigned long _n, _k, _m;
 
@@ -27,21 +27,26 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
 		for (_n = nkm->min_n; _n <= nkm->max_n; _n++) {
 			for (_m = nkm->min_m; _m <= nkm->max_m; _m++) {
 				unsigned long tmp_rate;
+				unsigned long tmp_diff;
 
 				tmp_rate = parent * _n * _k / _m;
 
-				if (tmp_rate > rate)
-					continue;
-				if ((rate - tmp_rate) < (rate - best_rate)) {
+				tmp_diff = abs(rate - tmp_rate);
+
+				if (tmp_diff < best_diff) {
 					best_rate = tmp_rate;
+					best_diff = tmp_diff;
 					best_n = _n;
 					best_k = _k;
 					best_m = _m;
+					if (best_diff == 0)
+						goto out;
 				}
 			}
 		}
 	}
 
+out:
 	nkm->n = best_n;
 	nkm->k = best_k;
 	nkm->m = best_m;
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection
  2023-05-27 13:27 [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Frank Oltmanns
  2023-05-27 13:27 ` [RFC PATCH 1/3] clk: sunxi-ng: nkm: Minimize difference when finding rate Frank Oltmanns
@ 2023-05-27 13:27 ` Frank Oltmanns
  2023-05-27 23:19   ` Julian Calaby
  2023-05-28 14:11   ` Frank Oltmanns
  2023-05-27 13:27 ` [RFC PATCH 3/3] clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-mipi Frank Oltmanns
  2023-05-31 13:48 ` [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Maxime Ripard
  3 siblings, 2 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-05-27 13:27 UTC (permalink / raw)
  To: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi
  Cc: Frank Oltmanns, Andre Przywara, Chen-Yu Tsai, Icenowy Zheng,
	Jernej Skrabec, Maxime Ripard, Michael Turquette, Rob Herring,
	Samuel Holland, Stephen Boyd

Add a new precalculation method for NKM clock rate selection in the
sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a
precalculated table of valid NKM combinations (struct clk_nkm_table and
struct clk_nkm_combo) to find the best rate. This approach provides
faster rate selection by searching a table of valid combinations rather
than calculating for all possible combinations.

The table of NKM combinations needs to be initialized with meaningful
combinations only, i.e. removing redundant combinations that result in
the same rate.

Keep the existing ccu_nkm_find_best function in place and use it as a
fallback if no precalculated table is provided.

Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
---
 drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++-------
 drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++
 2 files changed, 94 insertions(+), 16 deletions(-)

diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
index 94d2a83992b2..9652f6df17bd 100644
--- a/drivers/clk/sunxi-ng/ccu_nkm.c
+++ b/drivers/clk/sunxi-ng/ccu_nkm.c
@@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
 	return best_rate;
 }
 
+static unsigned long ccu_nkm_find_best_precalc(unsigned long parent,
+					       unsigned long rate,
+					       struct _ccu_nkm *nkm,
+					       struct clk_nkm_table *table)
+{
+	unsigned long best_rate = 0, best_diff = ULONG_MAX;
+	unsigned long best_n = 0, best_k = 0, best_m = 0;
+	int start = 0, end = table->num - 1, mid;
+
+	while (start <= end) {
+		unsigned long tmp_rate;
+		unsigned long tmp_diff;
+
+		mid = (start + end) / 2;
+
+		tmp_rate = parent * table->combos[mid].n * table->combos[mid].k /
+			   table->combos[mid].m;
+
+		tmp_diff = abs(rate - tmp_rate);
+
+		if (tmp_diff < best_diff) {
+			best_rate = tmp_rate;
+			best_diff = tmp_diff;
+			best_n = table->combos[mid].n;
+			best_k = table->combos[mid].k;
+			best_m = table->combos[mid].m;
+			if (best_diff == 0)
+				goto out;
+		}
+		if (rate < tmp_rate)
+			end = mid - 1;
+		else
+			start = mid + 1;
+	}
+
+out:
+	nkm->n = best_n;
+	nkm->k = best_k;
+	nkm->m = best_m;
+
+	return best_rate;
+}
+
 static void ccu_nkm_disable(struct clk_hw *hw)
 {
 	struct ccu_nkm *nkm = hw_to_ccu_nkm(hw);
@@ -119,17 +162,22 @@ static unsigned long ccu_nkm_round_rate(struct ccu_mux_internal *mux,
 	struct ccu_nkm *nkm = data;
 	struct _ccu_nkm _nkm;
 
-	_nkm.min_n = nkm->n.min ?: 1;
-	_nkm.max_n = nkm->n.max ?: 1 << nkm->n.width;
-	_nkm.min_k = nkm->k.min ?: 1;
-	_nkm.max_k = nkm->k.max ?: 1 << nkm->k.width;
-	_nkm.min_m = 1;
-	_nkm.max_m = nkm->m.max ?: 1 << nkm->m.width;
-
 	if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV)
 		rate *= nkm->fixed_post_div;
 
-	rate = ccu_nkm_find_best(*parent_rate, rate, &_nkm);
+	if (nkm->table.num)
+		rate = ccu_nkm_find_best_precalc(*parent_rate, rate, &_nkm,
+						 &nkm->table);
+	else {
+		_nkm.min_n = nkm->n.min ?: 1;
+		_nkm.max_n = nkm->n.max ?: 1 << nkm->n.width;
+		_nkm.min_k = nkm->k.min ?: 1;
+		_nkm.max_k = nkm->k.max ?: 1 << nkm->k.width;
+		_nkm.min_m = 1;
+		_nkm.max_m = nkm->m.max ?: 1 << nkm->m.width;
+
+		rate = ccu_nkm_find_best(*parent_rate, rate, &_nkm);
+	}
 
 	if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV)
 		rate /= nkm->fixed_post_div;
@@ -157,14 +205,18 @@ static int ccu_nkm_set_rate(struct clk_hw *hw, unsigned long rate,
 	if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV)
 		rate *= nkm->fixed_post_div;
 
-	_nkm.min_n = nkm->n.min ?: 1;
-	_nkm.max_n = nkm->n.max ?: 1 << nkm->n.width;
-	_nkm.min_k = nkm->k.min ?: 1;
-	_nkm.max_k = nkm->k.max ?: 1 << nkm->k.width;
-	_nkm.min_m = 1;
-	_nkm.max_m = nkm->m.max ?: 1 << nkm->m.width;
-
-	ccu_nkm_find_best(parent_rate, rate, &_nkm);
+	if (nkm->table.num)
+		rate = ccu_nkm_find_best_precalc(*parent_rate, rate, &_nkm,
+						 &nkm->table);
+	else {
+		_nkm.min_n = nkm->n.min ?: 1;
+		_nkm.max_n = nkm->n.max ?: 1 << nkm->n.width;
+		_nkm.min_k = nkm->k.min ?: 1;
+		_nkm.max_k = nkm->k.max ?: 1 << nkm->k.width;
+		_nkm.min_m = 1;
+		_nkm.max_m = nkm->m.max ?: 1 << nkm->m.width;
+		ccu_nkm_find_best(parent_rate, rate, &_nkm);
+	}
 
 	spin_lock_irqsave(nkm->common.lock, flags);
 
diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h
index 6601defb3f38..fa5551724921 100644
--- a/drivers/clk/sunxi-ng/ccu_nkm.h
+++ b/drivers/clk/sunxi-ng/ccu_nkm.h
@@ -12,6 +12,30 @@
 #include "ccu_div.h"
 #include "ccu_mult.h"
 
+struct clk_nkm_combo {
+	u8	n;
+	u8	k;
+	u8	m;
+};
+
+/**
+ * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m
+ *
+ * @num: Number of entries in the table
+ * @combos: Array of combos (of size num) that are supported by this clock.
+ *
+ * This table shall contain all meaningful combinations of n, k, and m. That
+ * means that combinations that result in the same clock rate shall only be
+ * listed once. For example, if both
+ * { .n = 1, .k = 2, .m = 2} and  { .n = 2, .k = 2, .m = 4}
+ * are valid values for n, k, and m, only one of them would be allowed because
+ * both result in a factor of 1.0.
+ */
+struct clk_nkm_table {
+	size_t			num;
+	struct clk_nkm_combo	*combos;
+};
+
 /*
  * struct ccu_nkm - Definition of an N-K-M clock
  *
@@ -29,6 +53,8 @@ struct ccu_nkm {
 	unsigned int		fixed_post_div;
 
 	struct ccu_common	common;
+
+	struct clk_nkm_table	table;
 };
 
 #define SUNXI_CCU_NKM_WITH_MUX_GATE_LOCK(_struct, _name, _parents, _reg, \
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC PATCH 3/3] clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-mipi
  2023-05-27 13:27 [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Frank Oltmanns
  2023-05-27 13:27 ` [RFC PATCH 1/3] clk: sunxi-ng: nkm: Minimize difference when finding rate Frank Oltmanns
  2023-05-27 13:27 ` [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection Frank Oltmanns
@ 2023-05-27 13:27 ` Frank Oltmanns
  2023-05-31 13:48 ` [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Maxime Ripard
  3 siblings, 0 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-05-27 13:27 UTC (permalink / raw)
  To: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi
  Cc: Frank Oltmanns, Andre Przywara, Chen-Yu Tsai, Icenowy Zheng,
	Jernej Skrabec, Maxime Ripard, Michael Turquette, Rob Herring,
	Samuel Holland, Stephen Boyd

The NKM driver now support using a table of precalculated NKM
combinations. Use this new method for sun50i-a64 for a faster selection
of clock rates.

Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
---
 drivers/clk/sunxi-ng/ccu-sun50i-a64.c | 281 ++++++++++++++++++++++++++
 1 file changed, 281 insertions(+)

diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-a64.c b/drivers/clk/sunxi-ng/ccu-sun50i-a64.c
index ff242bccc827..d201f9ec5378 100644
--- a/drivers/clk/sunxi-ng/ccu-sun50i-a64.c
+++ b/drivers/clk/sunxi-ng/ccu-sun50i-a64.c
@@ -165,6 +165,283 @@ static SUNXI_CCU_NM_WITH_FRAC_GATE_LOCK(pll_gpu_clk, "pll-gpu",
  */
 #define SUN50I_A64_PLL_MIPI_REG		0x040
 
+static struct clk_nkm_combo pll_mipi_nkm_combos[] = {
+	{ .n =  1, .k =  2,  .m = 16 /* 0.1250000*/ },
+	{ .n =  1, .k =  2,  .m = 15 /* 0.1333333*/ },
+	{ .n =  1, .k =  2,  .m = 14 /* 0.1428571*/ },
+	{ .n =  1, .k =  2,  .m = 13 /* 0.1538462*/ },
+	{ .n =  1, .k =  2,  .m = 12 /* 0.1666667*/ },
+	{ .n =  1, .k =  2,  .m = 11 /* 0.1818182*/ },
+	{ .n =  1, .k =  3,  .m = 16 /* 0.1875000*/ },
+	{ .n =  1, .k =  2,  .m = 10 /* 0.2000000*/ },
+	{ .n =  1, .k =  3,  .m = 14 /* 0.2142857*/ },
+	{ .n =  1, .k =  2,  .m =  9 /* 0.2222222*/ },
+	{ .n =  1, .k =  3,  .m = 13 /* 0.2307692*/ },
+	{ .n =  1, .k =  2,  .m =  8 /* 0.2500000*/ },
+	{ .n =  2, .k =  2,  .m = 15 /* 0.2666667*/ },
+	{ .n =  1, .k =  3,  .m = 11 /* 0.2727273*/ },
+	{ .n =  1, .k =  2,  .m =  7 /* 0.2857143*/ },
+	{ .n =  1, .k =  3,  .m = 10 /* 0.3000000*/ },
+	{ .n =  2, .k =  2,  .m = 13 /* 0.3076923*/ },
+	{ .n =  1, .k =  2,  .m =  6 /* 0.3333333*/ },
+	{ .n =  2, .k =  2,  .m = 11 /* 0.3636364*/ },
+	{ .n =  3, .k =  2,  .m = 16 /* 0.3750000*/ },
+	{ .n =  1, .k =  2,  .m =  5 /* 0.4000000*/ },
+	{ .n =  3, .k =  2,  .m = 14 /* 0.4285714*/ },
+	{ .n =  2, .k =  2,  .m =  9 /* 0.4444444*/ },
+	{ .n =  3, .k =  2,  .m = 13 /* 0.4615385*/ },
+	{ .n =  1, .k =  2,  .m =  4 /* 0.5000000*/ },
+	{ .n =  4, .k =  2,  .m = 15 /* 0.5333333*/ },
+	{ .n =  3, .k =  2,  .m = 11 /* 0.5454545*/ },
+	{ .n =  3, .k =  3,  .m = 16 /* 0.5625000*/ },
+	{ .n =  2, .k =  2,  .m =  7 /* 0.5714286*/ },
+	{ .n =  3, .k =  2,  .m = 10 /* 0.6000000*/ },
+	{ .n =  4, .k =  2,  .m = 13 /* 0.6153846*/ },
+	{ .n =  5, .k =  2,  .m = 16 /* 0.6250000*/ },
+	{ .n =  3, .k =  3,  .m = 14 /* 0.6428571*/ },
+	{ .n =  1, .k =  2,  .m =  3 /* 0.6666667*/ },
+	{ .n =  3, .k =  3,  .m = 13 /* 0.6923077*/ },
+	{ .n =  5, .k =  2,  .m = 14 /* 0.7142857*/ },
+	{ .n =  4, .k =  2,  .m = 11 /* 0.7272727*/ },
+	{ .n =  3, .k =  2,  .m =  8 /* 0.7500000*/ },
+	{ .n =  5, .k =  2,  .m = 13 /* 0.7692308*/ },
+	{ .n =  2, .k =  2,  .m =  5 /* 0.8000000*/ },
+	{ .n =  3, .k =  3,  .m = 11 /* 0.8181818*/ },
+	{ .n =  5, .k =  2,  .m = 12 /* 0.8333333*/ },
+	{ .n =  3, .k =  2,  .m =  7 /* 0.8571429*/ },
+	{ .n =  7, .k =  2,  .m = 16 /* 0.8750000*/ },
+	{ .n =  4, .k =  2,  .m =  9 /* 0.8888889*/ },
+	{ .n =  3, .k =  3,  .m = 10 /* 0.9000000*/ },
+	{ .n =  5, .k =  2,  .m = 11 /* 0.9090909*/ },
+	{ .n =  6, .k =  2,  .m = 13 /* 0.9230769*/ },
+	{ .n =  7, .k =  2,  .m = 15 /* 0.9333333*/ },
+	{ .n =  5, .k =  3,  .m = 16 /* 0.9375000*/ },
+	{ .n =  1, .k =  2,  .m =  2 /* 1.0000000*/ },
+	{ .n =  8, .k =  2,  .m = 15 /* 1.0666667*/ },
+	{ .n =  5, .k =  3,  .m = 14 /* 1.0714286*/ },
+	{ .n =  7, .k =  2,  .m = 13 /* 1.0769231*/ },
+	{ .n =  6, .k =  2,  .m = 11 /* 1.0909091*/ },
+	{ .n =  5, .k =  2,  .m =  9 /* 1.1111111*/ },
+	{ .n =  9, .k =  2,  .m = 16 /* 1.1250000*/ },
+	{ .n =  4, .k =  2,  .m =  7 /* 1.1428571*/ },
+	{ .n =  5, .k =  3,  .m = 13 /* 1.1538462*/ },
+	{ .n =  7, .k =  2,  .m = 12 /* 1.1666667*/ },
+	{ .n =  3, .k =  2,  .m =  5 /* 1.2000000*/ },
+	{ .n =  8, .k =  2,  .m = 13 /* 1.2307692*/ },
+	{ .n =  5, .k =  2,  .m =  8 /* 1.2500000*/ },
+	{ .n =  7, .k =  2,  .m = 11 /* 1.2727273*/ },
+	{ .n =  9, .k =  2,  .m = 14 /* 1.2857143*/ },
+	{ .n =  7, .k =  3,  .m = 16 /* 1.3125000*/ },
+	{ .n =  2, .k =  2,  .m =  3 /* 1.3333333*/ },
+	{ .n =  5, .k =  3,  .m = 11 /* 1.3636364*/ },
+	{ .n = 11, .k =  2,  .m = 16 /* 1.3750000*/ },
+	{ .n =  9, .k =  2,  .m = 13 /* 1.3846154*/ },
+	{ .n =  7, .k =  2,  .m = 10 /* 1.4000000*/ },
+	{ .n =  5, .k =  2,  .m =  7 /* 1.4285714*/ },
+	{ .n =  8, .k =  2,  .m = 11 /* 1.4545455*/ },
+	{ .n = 11, .k =  2,  .m = 15 /* 1.4666667*/ },
+	{ .n =  3, .k =  2,  .m =  4 /* 1.5000000*/ },
+	{ .n = 10, .k =  2,  .m = 13 /* 1.5384615*/ },
+	{ .n =  7, .k =  2,  .m =  9 /* 1.5555556*/ },
+	{ .n = 11, .k =  2,  .m = 14 /* 1.5714286*/ },
+	{ .n =  4, .k =  2,  .m =  5 /* 1.6000000*/ },
+	{ .n =  7, .k =  3,  .m = 13 /* 1.6153846*/ },
+	{ .n = 13, .k =  2,  .m = 16 /* 1.6250000*/ },
+	{ .n =  9, .k =  2,  .m = 11 /* 1.6363636*/ },
+	{ .n =  5, .k =  2,  .m =  6 /* 1.6666667*/ },
+	{ .n =  9, .k =  3,  .m = 16 /* 1.6875000*/ },
+	{ .n = 11, .k =  2,  .m = 13 /* 1.6923077*/ },
+	{ .n =  6, .k =  2,  .m =  7 /* 1.7142857*/ },
+	{ .n = 13, .k =  2,  .m = 15 /* 1.7333333*/ },
+	{ .n =  7, .k =  2,  .m =  8 /* 1.7500000*/ },
+	{ .n =  8, .k =  2,  .m =  9 /* 1.7777778*/ },
+	{ .n =  9, .k =  2,  .m = 10 /* 1.8000000*/ },
+	{ .n = 10, .k =  2,  .m = 11 /* 1.8181818*/ },
+	{ .n = 11, .k =  2,  .m = 12 /* 1.8333333*/ },
+	{ .n = 12, .k =  2,  .m = 13 /* 1.8461538*/ },
+	{ .n = 13, .k =  2,  .m = 14 /* 1.8571429*/ },
+	{ .n = 14, .k =  2,  .m = 15 /* 1.8666667*/ },
+	{ .n = 15, .k =  2,  .m = 16 /* 1.8750000*/ },
+	{ .n =  7, .k =  3,  .m = 11 /* 1.9090909*/ },
+	{ .n =  9, .k =  3,  .m = 14 /* 1.9285714*/ },
+	{ .n =  1, .k =  2,  .m =  1 /* 2.0000000*/ },
+	{ .n = 11, .k =  3,  .m = 16 /* 2.0625000*/ },
+	{ .n =  9, .k =  3,  .m = 13 /* 2.0769231*/ },
+	{ .n =  7, .k =  3,  .m = 10 /* 2.1000000*/ },
+	{ .n = 16, .k =  2,  .m = 15 /* 2.1333333*/ },
+	{ .n = 15, .k =  2,  .m = 14 /* 2.1428571*/ },
+	{ .n = 14, .k =  2,  .m = 13 /* 2.1538462*/ },
+	{ .n = 13, .k =  2,  .m = 12 /* 2.1666667*/ },
+	{ .n = 12, .k =  2,  .m = 11 /* 2.1818182*/ },
+	{ .n = 11, .k =  2,  .m = 10 /* 2.2000000*/ },
+	{ .n = 10, .k =  2,  .m =  9 /* 2.2222222*/ },
+	{ .n =  9, .k =  2,  .m =  8 /* 2.2500000*/ },
+	{ .n =  8, .k =  2,  .m =  7 /* 2.2857143*/ },
+	{ .n = 15, .k =  2,  .m = 13 /* 2.3076923*/ },
+	{ .n =  7, .k =  2,  .m =  6 /* 2.3333333*/ },
+	{ .n = 11, .k =  3,  .m = 14 /* 2.3571429*/ },
+	{ .n = 13, .k =  2,  .m = 11 /* 2.3636364*/ },
+	{ .n =  6, .k =  2,  .m =  5 /* 2.4000000*/ },
+	{ .n = 13, .k =  3,  .m = 16 /* 2.4375000*/ },
+	{ .n = 11, .k =  2,  .m =  9 /* 2.4444444*/ },
+	{ .n =  9, .k =  3,  .m = 11 /* 2.4545455*/ },
+	{ .n = 16, .k =  2,  .m = 13 /* 2.4615385*/ },
+	{ .n =  5, .k =  2,  .m =  4 /* 2.5000000*/ },
+	{ .n = 11, .k =  3,  .m = 13 /* 2.5384615*/ },
+	{ .n = 14, .k =  2,  .m = 11 /* 2.5454545*/ },
+	{ .n =  9, .k =  2,  .m =  7 /* 2.5714286*/ },
+	{ .n = 13, .k =  2,  .m = 10 /* 2.6000000*/ },
+	{ .n =  7, .k =  3,  .m =  8 /* 2.6250000*/ },
+	{ .n =  4, .k =  2,  .m =  3 /* 2.6666667*/ },
+	{ .n =  9, .k =  3,  .m = 10 /* 2.7000000*/ },
+	{ .n = 15, .k =  2,  .m = 11 /* 2.7272727*/ },
+	{ .n = 11, .k =  2,  .m =  8 /* 2.7500000*/ },
+	{ .n = 12, .k =  3,  .m = 13 /* 2.7692308*/ },
+	{ .n = 13, .k =  3,  .m = 14 /* 2.7857143*/ },
+	{ .n =  7, .k =  2,  .m =  5 /* 2.8000000*/ },
+	{ .n = 15, .k =  3,  .m = 16 /* 2.8125000*/ },
+	{ .n = 10, .k =  2,  .m =  7 /* 2.8571429*/ },
+	{ .n = 13, .k =  2,  .m =  9 /* 2.8888889*/ },
+	{ .n = 16, .k =  2,  .m = 11 /* 2.9090909*/ },
+	{ .n = 11, .k =  4,  .m = 15 /* 2.9333333*/ },
+	{ .n =  3, .k =  2,  .m =  2 /* 3.0000000*/ },
+	{ .n = 10, .k =  4,  .m = 13 /* 3.0769231*/ },
+	{ .n = 14, .k =  2,  .m =  9 /* 3.1111111*/ },
+	{ .n = 11, .k =  2,  .m =  7 /* 3.1428571*/ },
+	{ .n =  8, .k =  2,  .m =  5 /* 3.2000000*/ },
+	{ .n = 15, .k =  3,  .m = 14 /* 3.2142857*/ },
+	{ .n = 14, .k =  3,  .m = 13 /* 3.2307692*/ },
+	{ .n = 13, .k =  2,  .m =  8 /* 3.2500000*/ },
+	{ .n = 12, .k =  3,  .m = 11 /* 3.2727273*/ },
+	{ .n = 11, .k =  3,  .m = 10 /* 3.3000000*/ },
+	{ .n =  5, .k =  2,  .m =  3 /* 3.3333333*/ },
+	{ .n =  9, .k =  3,  .m =  8 /* 3.3750000*/ },
+	{ .n = 11, .k =  4,  .m = 13 /* 3.3846154*/ },
+	{ .n = 12, .k =  2,  .m =  7 /* 3.4285714*/ },
+	{ .n = 15, .k =  3,  .m = 13 /* 3.4615385*/ },
+	{ .n = 13, .k =  4,  .m = 15 /* 3.4666667*/ },
+	{ .n =  7, .k =  2,  .m =  4 /* 3.5000000*/ },
+	{ .n = 13, .k =  3,  .m = 11 /* 3.5454545*/ },
+	{ .n = 16, .k =  2,  .m =  9 /* 3.5555556*/ },
+	{ .n =  9, .k =  2,  .m =  5 /* 3.6000000*/ },
+	{ .n = 10, .k =  4,  .m = 11 /* 3.6363636*/ },
+	{ .n = 11, .k =  2,  .m =  6 /* 3.6666667*/ },
+	{ .n = 16, .k =  3,  .m = 13 /* 3.6923077*/ },
+	{ .n = 13, .k =  2,  .m =  7 /* 3.7142857*/ },
+	{ .n = 14, .k =  4,  .m = 15 /* 3.7333333*/ },
+	{ .n = 15, .k =  2,  .m =  8 /* 3.7500000*/ },
+	{ .n = 14, .k =  3,  .m = 11 /* 3.8181818*/ },
+	{ .n =  9, .k =  3,  .m =  7 /* 3.8571429*/ },
+	{ .n = 13, .k =  3,  .m = 10 /* 3.9000000*/ },
+	{ .n =  2, .k =  2,  .m =  1 /* 4.0000000*/ },
+	{ .n = 15, .k =  3,  .m = 11 /* 4.0909091*/ },
+	{ .n = 11, .k =  3,  .m =  8 /* 4.1250000*/ },
+	{ .n =  7, .k =  3,  .m =  5 /* 4.2000000*/ },
+	{ .n = 16, .k =  4,  .m = 15 /* 4.2666667*/ },
+	{ .n = 15, .k =  2,  .m =  7 /* 4.2857143*/ },
+	{ .n = 14, .k =  4,  .m = 13 /* 4.3076923*/ },
+	{ .n = 13, .k =  2,  .m =  6 /* 4.3333333*/ },
+	{ .n = 16, .k =  3,  .m = 11 /* 4.3636364*/ },
+	{ .n = 11, .k =  2,  .m =  5 /* 4.4000000*/ },
+	{ .n = 10, .k =  4,  .m =  9 /* 4.4444444*/ },
+	{ .n =  9, .k =  2,  .m =  4 /* 4.5000000*/ },
+	{ .n = 16, .k =  2,  .m =  7 /* 4.5714286*/ },
+	{ .n = 15, .k =  4,  .m = 13 /* 4.6153846*/ },
+	{ .n =  7, .k =  2,  .m =  3 /* 4.6666667*/ },
+	{ .n = 11, .k =  3,  .m =  7 /* 4.7142857*/ },
+	{ .n = 13, .k =  4,  .m = 11 /* 4.7272727*/ },
+	{ .n = 12, .k =  2,  .m =  5 /* 4.8000000*/ },
+	{ .n = 13, .k =  3,  .m =  8 /* 4.8750000*/ },
+	{ .n = 11, .k =  4,  .m =  9 /* 4.8888889*/ },
+	{ .n = 16, .k =  4,  .m = 13 /* 4.9230769*/ },
+	{ .n =  5, .k =  2,  .m =  2 /* 5.0000000*/ },
+	{ .n = 14, .k =  4,  .m = 11 /* 5.0909091*/ },
+	{ .n = 12, .k =  3,  .m =  7 /* 5.1428571*/ },
+	{ .n = 13, .k =  2,  .m =  5 /* 5.2000000*/ },
+	{ .n =  7, .k =  3,  .m =  4 /* 5.2500000*/ },
+	{ .n =  8, .k =  2,  .m =  3 /* 5.3333333*/ },
+	{ .n =  9, .k =  3,  .m =  5 /* 5.4000000*/ },
+	{ .n = 15, .k =  4,  .m = 11 /* 5.4545455*/ },
+	{ .n = 11, .k =  2,  .m =  4 /* 5.5000000*/ },
+	{ .n = 13, .k =  3,  .m =  7 /* 5.5714286*/ },
+	{ .n = 14, .k =  2,  .m =  5 /* 5.6000000*/ },
+	{ .n = 15, .k =  3,  .m =  8 /* 5.6250000*/ },
+	{ .n = 10, .k =  4,  .m =  7 /* 5.7142857*/ },
+	{ .n = 13, .k =  4,  .m =  9 /* 5.7777778*/ },
+	{ .n = 16, .k =  4,  .m = 11 /* 5.8181818*/ },
+	{ .n =  3, .k =  2,  .m =  1 /* 6.0000000*/ },
+	{ .n = 14, .k =  4,  .m =  9 /* 6.2222222*/ },
+	{ .n = 11, .k =  4,  .m =  7 /* 6.2857143*/ },
+	{ .n = 16, .k =  2,  .m =  5 /* 6.4000000*/ },
+	{ .n = 15, .k =  3,  .m =  7 /* 6.4285714*/ },
+	{ .n = 13, .k =  2,  .m =  4 /* 6.5000000*/ },
+	{ .n = 11, .k =  3,  .m =  5 /* 6.6000000*/ },
+	{ .n = 10, .k =  2,  .m =  3 /* 6.6666667*/ },
+	{ .n =  9, .k =  3,  .m =  4 /* 6.7500000*/ },
+	{ .n = 16, .k =  3,  .m =  7 /* 6.8571429*/ },
+	{ .n =  7, .k =  2,  .m =  2 /* 7.0000000*/ },
+	{ .n = 16, .k =  4,  .m =  9 /* 7.1111111*/ },
+	{ .n = 12, .k =  3,  .m =  5 /* 7.2000000*/ },
+	{ .n = 11, .k =  2,  .m =  3 /* 7.3333333*/ },
+	{ .n = 13, .k =  4,  .m =  7 /* 7.4285714*/ },
+	{ .n = 15, .k =  2,  .m =  4 /* 7.5000000*/ },
+	{ .n = 13, .k =  3,  .m =  5 /* 7.8000000*/ },
+	{ .n =  4, .k =  2,  .m =  1 /* 8.0000000*/ },
+	{ .n = 11, .k =  3,  .m =  4 /* 8.2500000*/ },
+	{ .n = 14, .k =  3,  .m =  5 /* 8.4000000*/ },
+	{ .n = 15, .k =  4,  .m =  7 /* 8.5714286*/ },
+	{ .n = 13, .k =  2,  .m =  3 /* 8.6666667*/ },
+	{ .n = 11, .k =  4,  .m =  5 /* 8.8000000*/ },
+	{ .n =  9, .k =  2,  .m =  2 /* 9.0000000*/ },
+	{ .n = 16, .k =  4,  .m =  7 /* 9.1428571*/ },
+	{ .n = 14, .k =  2,  .m =  3 /* 9.3333333*/ },
+	{ .n = 16, .k =  3,  .m =  5 /* 9.6000000*/ },
+	{ .n = 13, .k =  3,  .m =  4 /* 9.7500000*/ },
+	{ .n =  5, .k =  2,  .m =  1 /*10.0000000*/ },
+	{ .n = 13, .k =  4,  .m =  5 /*10.4000000*/ },
+	{ .n =  7, .k =  3,  .m =  2 /*10.5000000*/ },
+	{ .n = 16, .k =  2,  .m =  3 /*10.6666667*/ },
+	{ .n = 11, .k =  2,  .m =  2 /*11.0000000*/ },
+	{ .n = 14, .k =  4,  .m =  5 /*11.2000000*/ },
+	{ .n = 15, .k =  3,  .m =  4 /*11.2500000*/ },
+	{ .n =  6, .k =  2,  .m =  1 /*12.0000000*/ },
+	{ .n = 16, .k =  4,  .m =  5 /*12.8000000*/ },
+	{ .n = 13, .k =  2,  .m =  2 /*13.0000000*/ },
+	{ .n = 10, .k =  4,  .m =  3 /*13.3333333*/ },
+	{ .n =  9, .k =  3,  .m =  2 /*13.5000000*/ },
+	{ .n =  7, .k =  2,  .m =  1 /*14.0000000*/ },
+	{ .n = 11, .k =  4,  .m =  3 /*14.6666667*/ },
+	{ .n = 15, .k =  2,  .m =  2 /*15.0000000*/ },
+	{ .n =  8, .k =  2,  .m =  1 /*16.0000000*/ },
+	{ .n = 11, .k =  3,  .m =  2 /*16.5000000*/ },
+	{ .n = 13, .k =  4,  .m =  3 /*17.3333333*/ },
+	{ .n =  9, .k =  2,  .m =  1 /*18.0000000*/ },
+	{ .n = 14, .k =  4,  .m =  3 /*18.6666667*/ },
+	{ .n = 13, .k =  3,  .m =  2 /*19.5000000*/ },
+	{ .n = 10, .k =  2,  .m =  1 /*20.0000000*/ },
+	{ .n =  7, .k =  3,  .m =  1 /*21.0000000*/ },
+	{ .n = 16, .k =  4,  .m =  3 /*21.3333333*/ },
+	{ .n = 11, .k =  2,  .m =  1 /*22.0000000*/ },
+	{ .n = 15, .k =  3,  .m =  2 /*22.5000000*/ },
+	{ .n = 12, .k =  2,  .m =  1 /*24.0000000*/ },
+	{ .n = 13, .k =  2,  .m =  1 /*26.0000000*/ },
+	{ .n =  9, .k =  3,  .m =  1 /*27.0000000*/ },
+	{ .n = 14, .k =  2,  .m =  1 /*28.0000000*/ },
+	{ .n = 15, .k =  2,  .m =  1 /*30.0000000*/ },
+	{ .n = 16, .k =  2,  .m =  1 /*32.0000000*/ },
+	{ .n = 11, .k =  3,  .m =  1 /*33.0000000*/ },
+	{ .n = 12, .k =  3,  .m =  1 /*36.0000000*/ },
+	{ .n = 13, .k =  3,  .m =  1 /*39.0000000*/ },
+	{ .n = 10, .k =  4,  .m =  1 /*40.0000000*/ },
+	{ .n = 14, .k =  3,  .m =  1 /*42.0000000*/ },
+	{ .n = 11, .k =  4,  .m =  1 /*44.0000000*/ },
+	{ .n = 15, .k =  3,  .m =  1 /*45.0000000*/ },
+	{ .n = 16, .k =  3,  .m =  1 /*48.0000000*/ },
+	{ .n = 13, .k =  4,  .m =  1 /*52.0000000*/ },
+	{ .n = 14, .k =  4,  .m =  1 /*56.0000000*/ },
+	{ .n = 15, .k =  4,  .m =  1 /*60.0000000*/ },
+	{ .n = 16, .k =  4,  .m =  1 /*64.0000000*/ },
+};
 static struct ccu_nkm pll_mipi_clk = {
 	/*
 	 * The bit 23 and 22 are called "LDO{1,2}_EN" on the SoC's
@@ -181,6 +458,10 @@ static struct ccu_nkm pll_mipi_clk = {
 		.hw.init	= CLK_HW_INIT("pll-mipi", "pll-video0",
 					      &ccu_nkm_ops, CLK_SET_RATE_UNGATE),
 	},
+	.table = {
+		.num	= 275,
+		.combos	= pll_mipi_nkm_combos,
+	},
 };
 
 static SUNXI_CCU_NM_WITH_FRAC_GATE_LOCK(pll_hsic_clk, "pll-hsic",
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection
  2023-05-27 13:27 ` [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection Frank Oltmanns
@ 2023-05-27 23:19   ` Julian Calaby
  2023-05-28  9:12     ` Frank Oltmanns
  2023-05-28 14:11   ` Frank Oltmanns
  1 sibling, 1 reply; 19+ messages in thread
From: Julian Calaby @ 2023-05-27 23:19 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Maxime Ripard, Michael Turquette, Rob Herring, Samuel Holland,
	Stephen Boyd

Hi Frank,

On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote:
>
> Add a new precalculation method for NKM clock rate selection in the
> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a
> precalculated table of valid NKM combinations (struct clk_nkm_table and
> struct clk_nkm_combo) to find the best rate. This approach provides
> faster rate selection by searching a table of valid combinations rather
> than calculating for all possible combinations.
>
> The table of NKM combinations needs to be initialized with meaningful
> combinations only, i.e. removing redundant combinations that result in
> the same rate.
>
> Keep the existing ccu_nkm_find_best function in place and use it as a
> fallback if no precalculated table is provided.
>
> Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
> ---
>  drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++-------
>  drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++
>  2 files changed, 94 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
> index 94d2a83992b2..9652f6df17bd 100644
> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>         return best_rate;
>  }
>
> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent,
> +                                              unsigned long rate,
> +                                              struct _ccu_nkm *nkm,
> +                                              struct clk_nkm_table *table)
> +{
> +       unsigned long best_rate = 0, best_diff = ULONG_MAX;
> +       unsigned long best_n = 0, best_k = 0, best_m = 0;
> +       int start = 0, end = table->num - 1, mid;
> +
> +       while (start <= end) {
> +               unsigned long tmp_rate;
> +               unsigned long tmp_diff;
> +
> +               mid = (start + end) / 2;
> +
> +               tmp_rate = parent * table->combos[mid].n * table->combos[mid].k /
> +                          table->combos[mid].m;
> +
> +               tmp_diff = abs(rate - tmp_rate);
> +
> +               if (tmp_diff < best_diff) {
> +                       best_rate = tmp_rate;
> +                       best_diff = tmp_diff;
> +                       best_n = table->combos[mid].n;
> +                       best_k = table->combos[mid].k;
> +                       best_m = table->combos[mid].m;
> +                       if (best_diff == 0)
> +                               goto out;
> +               }

If the table was sorted by n * k / m, this could just be a process of
searching through until we either:
- find that the first rate in the table is too high
- find an exact rate
- go above the requested rate, then there's only two to compare: our
current rate and the previous one

This should massively simplify this function and would still work with
a binary search.

> +               if (rate < tmp_rate)
> +                       end = mid - 1;
> +               else
> +                       start = mid + 1;
> +       }
> +
> +out:
> +       nkm->n = best_n;
> +       nkm->k = best_k;
> +       nkm->m = best_m;
> +
> +       return best_rate;
> +}
> +
>  static void ccu_nkm_disable(struct clk_hw *hw)
>  {
>         struct ccu_nkm *nkm = hw_to_ccu_nkm(hw);
> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h
> index 6601defb3f38..fa5551724921 100644
> --- a/drivers/clk/sunxi-ng/ccu_nkm.h
> +++ b/drivers/clk/sunxi-ng/ccu_nkm.h
> @@ -12,6 +12,30 @@
>  #include "ccu_div.h"
>  #include "ccu_mult.h"
>
> +struct clk_nkm_combo {
> +       u8      n;
> +       u8      k;
> +       u8      m;
> +};
> +
> +/**
> + * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m
> + *
> + * @num: Number of entries in the table
> + * @combos: Array of combos (of size num) that are supported by this clock.
> + *
> + * This table shall contain all meaningful combinations of n, k, and m. That
> + * means that combinations that result in the same clock rate shall only be
> + * listed once. For example, if both
> + * { .n = 1, .k = 2, .m = 2} and  { .n = 2, .k = 2, .m = 4}
> + * are valid values for n, k, and m, only one of them would be allowed because
> + * both result in a factor of 1.0.
> + */
> +struct clk_nkm_table {
> +       size_t                  num;
> +       struct clk_nkm_combo    *combos;

Should this be a "flex" array, i.e.

struct clk_nkm_combo combos[]

> +};
> +
>  /*
>   * struct ccu_nkm - Definition of an N-K-M clock
>   *

Thanks,

-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection
  2023-05-27 23:19   ` Julian Calaby
@ 2023-05-28  9:12     ` Frank Oltmanns
  2023-05-28 15:32       ` Julian Calaby
  0 siblings, 1 reply; 19+ messages in thread
From: Frank Oltmanns @ 2023-05-28  9:12 UTC (permalink / raw)
  To: Julian Calaby
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Maxime Ripard, Michael Turquette, Rob Herring, Samuel Holland,
	Stephen Boyd

Hi Julian,

On 2023-05-28 at 09:19:36 +1000, Julian Calaby <julian.calaby@gmail.com> wrote:
> Hi Frank,
>
> On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote:
>>
>> Add a new precalculation method for NKM clock rate selection in the
>> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a
>> precalculated table of valid NKM combinations (struct clk_nkm_table and
>> struct clk_nkm_combo) to find the best rate. This approach provides
>> faster rate selection by searching a table of valid combinations rather
>> than calculating for all possible combinations.
>>
>> The table of NKM combinations needs to be initialized with meaningful
>> combinations only, i.e. removing redundant combinations that result in
>> the same rate.
>>
>> Keep the existing ccu_nkm_find_best function in place and use it as a
>> fallback if no precalculated table is provided.
>>
>> Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
>> ---
>>  drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++-------
>>  drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++
>>  2 files changed, 94 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
>> index 94d2a83992b2..9652f6df17bd 100644
>> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
>> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
>> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>>         return best_rate;
>>  }
>>
>> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent,
>> +                                              unsigned long rate,
>> +                                              struct _ccu_nkm *nkm,
>> +                                              struct clk_nkm_table *table)
>> +{
>> +       unsigned long best_rate = 0, best_diff = ULONG_MAX;
>> +       unsigned long best_n = 0, best_k = 0, best_m = 0;
>> +       int start = 0, end = table->num - 1, mid;
>> +
>> +       while (start <= end) {
>> +               unsigned long tmp_rate;
>> +               unsigned long tmp_diff;
>> +
>> +               mid = (start + end) / 2;
>> +
>> +               tmp_rate = parent * table->combos[mid].n * table->combos[mid].k /
>> +                          table->combos[mid].m;
>> +
>> +               tmp_diff = abs(rate - tmp_rate);
>> +
>> +               if (tmp_diff < best_diff) {
>> +                       best_rate = tmp_rate;
>> +                       best_diff = tmp_diff;
>> +                       best_n = table->combos[mid].n;
>> +                       best_k = table->combos[mid].k;
>> +                       best_m = table->combos[mid].m;
>> +                       if (best_diff == 0)
>> +                               goto out;
>> +               }
>

Thank you for your feedback!

In my proposal, the code performs a binary search by
 1. taking the element in the middle (mid)
 2. calculating the rate of the element (tmp_rate)
 3. calculating the difference to the requested rate (tmp_diff)
 4. if the diff is better than the best_diff making it the new best
    n-k-m-combo (the if block)

> If the table was sorted by n * k / m, this could just be a process of

Please note, the table already has to be sorted for the function to
work, as is the nature of a binary search. I should definitely add
comments. I'm sorry, the code was intended more as a basis to discuss
the general idea that I described in the cover letter. I should have
made that clearer.

> searching through until we either:
> - find that the first rate in the table is too high

I could see that I could add two steps in the beginning, before the loop:
 - Take the first element and see if its rate is greater than the
   requested rate, if so immediatly return it
 - Take the last element and see if its rate is less than the requested
   rate, if so immediatly return it

Is that what you mean? I'd have to run some simulations to see, if this
is a real improvement, because we would need two additional rate
calculations. Worst case would therefore be 2+log(n) calculations
instead of log(n) and the code would be slightly more complicated in my
opinion. But if we run this function with all possible parents rate (as
suggested in the end of my cover letter) these two special cases could
very well be often applicable. Thanks!

> - find an exact rate

What do you mean by "exact rate"? Do you mean a rate that matches the
requested rate exactly. This is what the code is already trying to do.
But, as this is not always possible, in cases where it does not find an
exact match, it takes the closest match instead.

> - go above the requested rate, then there's only two to compare: our
> current rate and the previous one

Sorry, you've lost me here. How would I go above the requested rate? You
would have to do the binary search to find that rate, but then why not
search the closest rate directly (as the code does) instead of searching
the closest rate above the requested (as you proposed). I feel like
either one of us is missing something. :)

> This should massively simplify this function and would still work with
> a binary search.

Sidenote, we could store best_index instead of best_n, best_k, best_m,
but for the first iteration I tried to keep it as close as possible to
the original ccu_nkm_find_best() function.

>
>> +               if (rate < tmp_rate)
>> +                       end = mid - 1;
>> +               else
>> +                       start = mid + 1;
>> +       }
>> +
>> +out:
>> +       nkm->n = best_n;
>> +       nkm->k = best_k;
>> +       nkm->m = best_m;
>> +
>> +       return best_rate;
>> +}
>> +
>>  static void ccu_nkm_disable(struct clk_hw *hw)
>>  {
>>         struct ccu_nkm *nkm = hw_to_ccu_nkm(hw);
>> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h
>> index 6601defb3f38..fa5551724921 100644
>> --- a/drivers/clk/sunxi-ng/ccu_nkm.h
>> +++ b/drivers/clk/sunxi-ng/ccu_nkm.h
>> @@ -12,6 +12,30 @@
>>  #include "ccu_div.h"
>>  #include "ccu_mult.h"
>>
>> +struct clk_nkm_combo {
>> +       u8      n;
>> +       u8      k;
>> +       u8      m;
>> +};
>> +
>> +/**
>> + * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m
>> + *
>> + * @num: Number of entries in the table
>> + * @combos: Array of combos (of size num) that are supported by this clock.
>> + *
>> + * This table shall contain all meaningful combinations of n, k, and m. That
>> + * means that combinations that result in the same clock rate shall only be
>> + * listed once. For example, if both
>> + * { .n = 1, .k = 2, .m = 2} and  { .n = 2, .k = 2, .m = 4}
>> + * are valid values for n, k, and m, only one of them would be allowed because
>> + * both result in a factor of 1.0.
>> + */
>> +struct clk_nkm_table {
>> +       size_t                  num;
>> +       struct clk_nkm_combo    *combos;
>
> Should this be a "flex" array, i.e.
>
> struct clk_nkm_combo combos[]

Thanks, noted. I'll look into that once we have an agreement on the
general concept. I think it depends on the fact if we want to use values
that have been calculated off-line (i.e. prior to compilation) or if we
want to create the table at run-time (i.e. when needed) using kmalloc.
See my alternate proposals in the cover letter for details. I'll need to
check if the run-time approach works with "flex" arrays.

Thanks,
  Frank

>
>> +};
>> +
>>  /*
>>   * struct ccu_nkm - Definition of an N-K-M clock
>>   *
>
> Thanks,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection
  2023-05-27 13:27 ` [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection Frank Oltmanns
  2023-05-27 23:19   ` Julian Calaby
@ 2023-05-28 14:11   ` Frank Oltmanns
  1 sibling, 0 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-05-28 14:11 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Maxime Ripard, Michael Turquette, Rob Herring, Samuel Holland,
	Stephen Boyd


On 2023-05-27 at 15:27:46 +0200, Frank Oltmanns <frank@oltmanns.dev> wrote:
[...]
> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
> index 94d2a83992b2..9652f6df17bd 100644
> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
[...]
> @@ -157,14 +205,18 @@ static int ccu_nkm_set_rate(struct clk_hw *hw, unsigned long rate,
>  	if (nkm->common.features & CCU_FEATURE_FIXED_POSTDIV)
>  		rate *= nkm->fixed_post_div;
>
> -	_nkm.min_n = nkm->n.min ?: 1;
> -	_nkm.max_n = nkm->n.max ?: 1 << nkm->n.width;
> -	_nkm.min_k = nkm->k.min ?: 1;
> -	_nkm.max_k = nkm->k.max ?: 1 << nkm->k.width;
> -	_nkm.min_m = 1;
> -	_nkm.max_m = nkm->m.max ?: 1 << nkm->m.width;
> -
> -	ccu_nkm_find_best(parent_rate, rate, &_nkm);
> +	if (nkm->table.num)
> +		rate = ccu_nkm_find_best_precalc(*parent_rate, rate, &_nkm,

Ugh! s/*parent_rate/parent_rate/
Sorry! Thanks to intel kernel test robot for pointing it out:
https://lore.kernel.org/oe-kbuild-all/202305280405.bUAMrEtn-lkp@intel.com/

Cheers,
  Frank

> +						 &nkm->table);
> +	else {
> +		_nkm.min_n = nkm->n.min ?: 1;
> +		_nkm.max_n = nkm->n.max ?: 1 << nkm->n.width;
> +		_nkm.min_k = nkm->k.min ?: 1;
> +		_nkm.max_k = nkm->k.max ?: 1 << nkm->k.width;
> +		_nkm.min_m = 1;
> +		_nkm.max_m = nkm->m.max ?: 1 << nkm->m.width;
> +		ccu_nkm_find_best(parent_rate, rate, &_nkm);
> +	}
>
>  	spin_lock_irqsave(nkm->common.lock, flags);
>
> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.h b/drivers/clk/sunxi-ng/ccu_nkm.h
> index 6601defb3f38..fa5551724921 100644
> --- a/drivers/clk/sunxi-ng/ccu_nkm.h
> +++ b/drivers/clk/sunxi-ng/ccu_nkm.h
> @@ -12,6 +12,30 @@
>  #include "ccu_div.h"
>  #include "ccu_mult.h"
>
> +struct clk_nkm_combo {
> +	u8	n;
> +	u8	k;
> +	u8	m;
> +};
> +
> +/**
> + * struct clk_nkm_table - Table of all meaningful combinations for n, k, and m
> + *
> + * @num: Number of entries in the table
> + * @combos: Array of combos (of size num) that are supported by this clock.
> + *
> + * This table shall contain all meaningful combinations of n, k, and m. That
> + * means that combinations that result in the same clock rate shall only be
> + * listed once. For example, if both
> + * { .n = 1, .k = 2, .m = 2} and  { .n = 2, .k = 2, .m = 4}
> + * are valid values for n, k, and m, only one of them would be allowed because
> + * both result in a factor of 1.0.
> + */
> +struct clk_nkm_table {
> +	size_t			num;
> +	struct clk_nkm_combo	*combos;
> +};
> +
>  /*
>   * struct ccu_nkm - Definition of an N-K-M clock
>   *
> @@ -29,6 +53,8 @@ struct ccu_nkm {
>  	unsigned int		fixed_post_div;
>
>  	struct ccu_common	common;
> +
> +	struct clk_nkm_table	table;
>  };
>
>  #define SUNXI_CCU_NKM_WITH_MUX_GATE_LOCK(_struct, _name, _parents, _reg, \

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection
  2023-05-28  9:12     ` Frank Oltmanns
@ 2023-05-28 15:32       ` Julian Calaby
  2023-05-28 17:12         ` Frank Oltmanns
  0 siblings, 1 reply; 19+ messages in thread
From: Julian Calaby @ 2023-05-28 15:32 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Maxime Ripard, Michael Turquette, Rob Herring, Samuel Holland,
	Stephen Boyd

Hi Frank,

On Sun, May 28, 2023 at 8:10 PM Frank Oltmanns <frank@oltmanns.dev> wrote:
>
> Hi Julian,
>
> On 2023-05-28 at 09:19:36 +1000, Julian Calaby <julian.calaby@gmail.com> wrote:
> > Hi Frank,
> >
> > On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote:
> >>
> >> Add a new precalculation method for NKM clock rate selection in the
> >> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a
> >> precalculated table of valid NKM combinations (struct clk_nkm_table and
> >> struct clk_nkm_combo) to find the best rate. This approach provides
> >> faster rate selection by searching a table of valid combinations rather
> >> than calculating for all possible combinations.
> >>
> >> The table of NKM combinations needs to be initialized with meaningful
> >> combinations only, i.e. removing redundant combinations that result in
> >> the same rate.
> >>
> >> Keep the existing ccu_nkm_find_best function in place and use it as a
> >> fallback if no precalculated table is provided.
> >>
> >> Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
> >> ---
> >>  drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++-------
> >>  drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++
> >>  2 files changed, 94 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
> >> index 94d2a83992b2..9652f6df17bd 100644
> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
> >> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
> >>         return best_rate;
> >>  }
> >>
> >> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent,
> >> +                                              unsigned long rate,
> >> +                                              struct _ccu_nkm *nkm,
> >> +                                              struct clk_nkm_table *table)
> >> +{
> >> +       unsigned long best_rate = 0, best_diff = ULONG_MAX;
> >> +       unsigned long best_n = 0, best_k = 0, best_m = 0;
> >> +       int start = 0, end = table->num - 1, mid;
> >> +
> >> +       while (start <= end) {
> >> +               unsigned long tmp_rate;
> >> +               unsigned long tmp_diff;
> >> +
> >> +               mid = (start + end) / 2;
> >> +
> >> +               tmp_rate = parent * table->combos[mid].n * table->combos[mid].k /
> >> +                          table->combos[mid].m;
> >> +
> >> +               tmp_diff = abs(rate - tmp_rate);
> >> +
> >> +               if (tmp_diff < best_diff) {
> >> +                       best_rate = tmp_rate;
> >> +                       best_diff = tmp_diff;
> >> +                       best_n = table->combos[mid].n;
> >> +                       best_k = table->combos[mid].k;
> >> +                       best_m = table->combos[mid].m;
> >> +                       if (best_diff == 0)
> >> +                               goto out;
> >> +               }
> >
>
> Thank you for your feedback!
>
> In my proposal, the code performs a binary search by
>  1. taking the element in the middle (mid)
>  2. calculating the rate of the element (tmp_rate)
>  3. calculating the difference to the requested rate (tmp_diff)
>  4. if the diff is better than the best_diff making it the new best
>     n-k-m-combo (the if block)

I'm so sorry, I thought that this was still doing a linear search as
it's so close to the original code.

>
> > If the table was sorted by n * k / m, this could just be a process of
>
> Please note, the table already has to be sorted for the function to
> work, as is the nature of a binary search. I should definitely add
> comments. I'm sorry, the code was intended more as a basis to discuss
> the general idea that I described in the cover letter. I should have
> made that clearer.
>
> > searching through until we either:
> > - find that the first rate in the table is too high
>
> I could see that I could add two steps in the beginning, before the loop:
>  - Take the first element and see if its rate is greater than the
>    requested rate, if so immediatly return it
>  - Take the last element and see if its rate is less than the requested
>    rate, if so immediatly return it
>
> Is that what you mean? I'd have to run some simulations to see, if this
> is a real improvement, because we would need two additional rate
> calculations. Worst case would therefore be 2+log(n) calculations
> instead of log(n) and the code would be slightly more complicated in my
> opinion. But if we run this function with all possible parents rate (as
> suggested in the end of my cover letter) these two special cases could
> very well be often applicable. Thanks!
>
> > - find an exact rate
>
> What do you mean by "exact rate"? Do you mean a rate that matches the
> requested rate exactly. This is what the code is already trying to do.
> But, as this is not always possible, in cases where it does not find an
> exact match, it takes the closest match instead.
>
> > - go above the requested rate, then there's only two to compare: our
> > current rate and the previous one
>
> Sorry, you've lost me here. How would I go above the requested rate? You
> would have to do the binary search to find that rate, but then why not
> search the closest rate directly (as the code does) instead of searching
> the closest rate above the requested (as you proposed). I feel like
> either one of us is missing something. :)

What we're missing is that I'm not explaining this well.

Let's take a very simple table: (value = parent * n * k / m)

0. 100
1. 200
2. 300
3. 400

If we search for 50, our closest is the first rate, so index 0: this
is the "find that the first rate in the table is too high" case.

If we search for 300, we'll converge on index 2: this is the "exact
rate" situation.

If we search for 275, then we'll converge on either 200 or 300: this
is the "two to compare" situation: if we converge until we get to the
lowest rate above our target, we only need to check the rate
immediately before it in the table and the one we converged on to find
the closest.

So in pseudo-code, we'd end up with something like this:

--------

start = 0;

cur_rate = parent * table[start].n * table[start].k / table[start].m;

if (cur_rate >= target)
    return table[start];

while (start <= end) {
    mid = (start + end) / 2;

    cur_rate = parent * table[mid].n * table[mid].k / table[mid].m;

    if (cur_rate == target)
        return table[mid];

   if (target < cur_rate)
       end = mid - 1;
   else
       start = mid + 1;
}

prev_rate = parent * table[mid - 1].n * table[mid - 1].k / table[mid - 1].m;

if (abs(target - prev_rate) < abs(target - cur_rate))
    return table[mid - 1];

return table[mid];

--------

Which seems simpler to my eye and moves all the difference
calculations out of the loop so they only have to be done once,
effectively trading a difference calculation on each checked rate for
a rate calculation, and dropping some variables in the process.

Thanks,

-- 
Julian Calaby

Email: julian.calaby@gmail.com
Profile: http://www.google.com/profiles/julian.calaby/

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection
  2023-05-28 15:32       ` Julian Calaby
@ 2023-05-28 17:12         ` Frank Oltmanns
  0 siblings, 0 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-05-28 17:12 UTC (permalink / raw)
  To: Julian Calaby
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Maxime Ripard, Michael Turquette, Rob Herring, Samuel Holland,
	Stephen Boyd

Hi Julian,

On 2023-05-29 at 01:32:02 +1000, Julian Calaby <julian.calaby@gmail.com> wrote:
> Hi Frank,
>
> On Sun, May 28, 2023 at 8:10 PM Frank Oltmanns <frank@oltmanns.dev> wrote:
>>
>> Hi Julian,
>>
>> On 2023-05-28 at 09:19:36 +1000, Julian Calaby <julian.calaby@gmail.com> wrote:
>> > Hi Frank,
>> >
>> > On Sat, May 27, 2023 at 11:37 PM Frank Oltmanns <frank@oltmanns.dev> wrote:
>> >>
>> >> Add a new precalculation method for NKM clock rate selection in the
>> >> sunxi-ng clock driver. Introduce ccu_nkm_find_best_precalc which uses a
>> >> precalculated table of valid NKM combinations (struct clk_nkm_table and
>> >> struct clk_nkm_combo) to find the best rate. This approach provides
>> >> faster rate selection by searching a table of valid combinations rather
>> >> than calculating for all possible combinations.
>> >>
>> >> The table of NKM combinations needs to be initialized with meaningful
>> >> combinations only, i.e. removing redundant combinations that result in
>> >> the same rate.
>> >>
>> >> Keep the existing ccu_nkm_find_best function in place and use it as a
>> >> fallback if no precalculated table is provided.
>> >>
>> >> Signed-off-by: Frank Oltmanns <frank@oltmanns.dev>
>> >> ---
>> >>  drivers/clk/sunxi-ng/ccu_nkm.c | 84 +++++++++++++++++++++++++++-------
>> >>  drivers/clk/sunxi-ng/ccu_nkm.h | 26 +++++++++++
>> >>  2 files changed, 94 insertions(+), 16 deletions(-)
>> >>
>> >> diff --git a/drivers/clk/sunxi-ng/ccu_nkm.c b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> index 94d2a83992b2..9652f6df17bd 100644
>> >> --- a/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> +++ b/drivers/clk/sunxi-ng/ccu_nkm.c
>> >> @@ -54,6 +54,49 @@ static unsigned long ccu_nkm_find_best(unsigned long parent, unsigned long rate,
>> >>         return best_rate;
>> >>  }
>> >>
>> >> +static unsigned long ccu_nkm_find_best_precalc(unsigned long parent,
>> >> +                                              unsigned long rate,
>> >> +                                              struct _ccu_nkm *nkm,
>> >> +                                              struct clk_nkm_table *table)
>> >> +{
>> >> +       unsigned long best_rate = 0, best_diff = ULONG_MAX;
>> >> +       unsigned long best_n = 0, best_k = 0, best_m = 0;
>> >> +       int start = 0, end = table->num - 1, mid;
>> >> +
>> >> +       while (start <= end) {
>> >> +               unsigned long tmp_rate;
>> >> +               unsigned long tmp_diff;
>> >> +
>> >> +               mid = (start + end) / 2;
>> >> +
>> >> +               tmp_rate = parent * table->combos[mid].n * table->combos[mid].k /
>> >> +                          table->combos[mid].m;
>> >> +
>> >> +               tmp_diff = abs(rate - tmp_rate);
>> >> +
>> >> +               if (tmp_diff < best_diff) {
>> >> +                       best_rate = tmp_rate;
>> >> +                       best_diff = tmp_diff;
>> >> +                       best_n = table->combos[mid].n;
>> >> +                       best_k = table->combos[mid].k;
>> >> +                       best_m = table->combos[mid].m;
>> >> +                       if (best_diff == 0)
>> >> +                               goto out;
>> >> +               }
>> >
>>
>> Thank you for your feedback!
>>
>> In my proposal, the code performs a binary search by
>>  1. taking the element in the middle (mid)
>>  2. calculating the rate of the element (tmp_rate)
>>  3. calculating the difference to the requested rate (tmp_diff)
>>  4. if the diff is better than the best_diff making it the new best
>>     n-k-m-combo (the if block)
>
> I'm so sorry, I thought that this was still doing a linear search as
> it's so close to the original code.
>
>>
>> > If the table was sorted by n * k / m, this could just be a process of
>>
>> Please note, the table already has to be sorted for the function to
>> work, as is the nature of a binary search. I should definitely add
>> comments. I'm sorry, the code was intended more as a basis to discuss
>> the general idea that I described in the cover letter. I should have
>> made that clearer.
>>
>> > searching through until we either:
>> > - find that the first rate in the table is too high
>>
>> I could see that I could add two steps in the beginning, before the loop:
>>  - Take the first element and see if its rate is greater than the
>>    requested rate, if so immediatly return it
>>  - Take the last element and see if its rate is less than the requested
>>    rate, if so immediatly return it
>>
>> Is that what you mean? I'd have to run some simulations to see, if this
>> is a real improvement, because we would need two additional rate
>> calculations. Worst case would therefore be 2+log(n) calculations
>> instead of log(n) and the code would be slightly more complicated in my
>> opinion. But if we run this function with all possible parents rate (as
>> suggested in the end of my cover letter) these two special cases could
>> very well be often applicable. Thanks!
>>
>> > - find an exact rate
>>
>> What do you mean by "exact rate"? Do you mean a rate that matches the
>> requested rate exactly. This is what the code is already trying to do.
>> But, as this is not always possible, in cases where it does not find an
>> exact match, it takes the closest match instead.
>>
>> > - go above the requested rate, then there's only two to compare: our
>> > current rate and the previous one
>>
>> Sorry, you've lost me here. How would I go above the requested rate? You
>> would have to do the binary search to find that rate, but then why not
>> search the closest rate directly (as the code does) instead of searching
>> the closest rate above the requested (as you proposed). I feel like
>> either one of us is missing something. :)
>
> What we're missing is that I'm not explaining this well.
>
> Let's take a very simple table: (value = parent * n * k / m)
>
> 0. 100
> 1. 200
> 2. 300
> 3. 400
>
> If we search for 50, our closest is the first rate, so index 0: this
> is the "find that the first rate in the table is too high" case.
>
> If we search for 300, we'll converge on index 2: this is the "exact
> rate" situation.
>
> If we search for 275, then we'll converge on either 200 or 300: this
> is the "two to compare" situation: if we converge until we get to the
> lowest rate above our target, we only need to check the rate
> immediately before it in the table and the one we converged on to find
> the closest.
>
> So in pseudo-code, we'd end up with something like this:
>
> --------
>
> start = 0;
>
> cur_rate = parent * table[start].n * table[start].k / table[start].m;
>
> if (cur_rate >= target)
>     return table[start];
>
> while (start <= end) {
>     mid = (start + end) / 2;

Thanks for the thorough explanation!

This needs to be (start + end + 1) / 2

Otherwise, if we extend your hypothetical list above with another item,
let's say 500 and look for 199, this would result in the loop finishing
with mid = 0, if I'm not mistaken, and hence an access to table[-1] when
calculating prev_rate below. Not good.

But I *think*, with (start + end + 1) / 2 it works in all cases.

>
>     cur_rate = parent * table[mid].n * table[mid].k / table[mid].m;
>
>     if (cur_rate == target)
>         return table[mid];
>
>    if (target < cur_rate)
>        end = mid - 1;
>    else
>        start = mid + 1;
> }
>
> prev_rate = parent * table[mid - 1].n * table[mid - 1].k / table[mid - 1].m;
>
> if (abs(target - prev_rate) < abs(target - cur_rate))
>     return table[mid - 1];
>
> return table[mid];
>
> --------
>
> Which seems simpler to my eye and moves all the difference
> calculations out of the loop so they only have to be done once,
> effectively trading a difference calculation on each checked rate for
> a rate calculation, and dropping some variables in the process.

At least it's shorter. I'm not sure it's simpler (after all it contained
a mistake, I think ;-)). Still, it looks neat, so I might still use your
(revised) algorithm.

Thanks,
  Frank
>
> Thanks,

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-05-27 13:27 [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Frank Oltmanns
                   ` (2 preceding siblings ...)
  2023-05-27 13:27 ` [RFC PATCH 3/3] clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-mipi Frank Oltmanns
@ 2023-05-31 13:48 ` Maxime Ripard
  2023-06-01  5:16   ` Frank Oltmanns
  3 siblings, 1 reply; 19+ messages in thread
From: Maxime Ripard @ 2023-05-31 13:48 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

[-- Attachment #1: Type: text/plain, Size: 1596 bytes --]

Hi Frank,

On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
> I would like to bring your attention to the current process of setting
> the rate of an NKM clock. As it stands, when setting the rate of an
> NKM clock, the rate nearest but less than or equal to the requested
> rate is found, instead of the nearest rate.

Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
for example. Some devices require that we don't overshoot, while some
prefer to have the closest rate.

Both are fine, and it's a bit context specific which one we should
favour. If we were to do anything, it would be to support both and let
the clock driver select which behaviour it wants.

> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
> when setting a rate, each time iterating over all combinations of n,
> k, and m.

Yeah, that's expected as well.

> In response to this, I propose the following refinements to optimize the NKM
> clock setting:
>  a. when finding the best rate use the nearest rate, even if it is greater than
>     the requested rate (PATCH 1)
>  b. utilize binary search to find the best rate by going through a
>     precalculated, ordered list of all meaningful combinations of n, k, and m
>     (PATCH 2)

One thing you haven't really addressed is why we would be doing this? Is
there some clocks that require a more precise clock and don't? Is the
factor calculation a bottleneck for some workloads?

Clocks in general are very regression-prone, so I'd rather be a bit
conservative there, and "if it ain't broke, don't fix it".

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-05-31 13:48 ` [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Maxime Ripard
@ 2023-06-01  5:16   ` Frank Oltmanns
  2023-06-01 19:41     ` Jernej Škrabec
  2023-06-02  7:31     ` Maxime Ripard
  0 siblings, 2 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-06-01  5:16 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

Hi Maxime,

On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> [[PGP Signed Part:Undecided]]
> Hi Frank,
>
> On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
>> I would like to bring your attention to the current process of setting
>> the rate of an NKM clock. As it stands, when setting the rate of an
>> NKM clock, the rate nearest but less than or equal to the requested
>> rate is found, instead of the nearest rate.
>
> Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
> for example. Some devices require that we don't overshoot, while some
> prefer to have the closest rate.
>
> Both are fine, and it's a bit context specific which one we should
> favour. If we were to do anything, it would be to support both and let
> the clock driver select which behaviour it wants.
>

Ok, understood. Thank you for the explanation! Now, I'm wondering if
anyone would be using such a flag, if I added it.

>
>> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
>> when setting a rate, each time iterating over all combinations of n,
>> k, and m.
>
> Yeah, that's expected as well.

I'm wondering though, if iterating over all combinations is set in
stone, or if some kind of optimization would be in order.

>
>> In response to this, I propose the following refinements to optimize the NKM
>> clock setting:
>>  a. when finding the best rate use the nearest rate, even if it is greater than
>>     the requested rate (PATCH 1)
>>  b. utilize binary search to find the best rate by going through a
>>     precalculated, ordered list of all meaningful combinations of n, k, and m
>>     (PATCH 2)
>
> One thing you haven't really addressed is why we would be doing this? Is
> there some clocks that require a more precise clock and don't? Is the
> factor calculation a bottleneck for some workloads?

Background
==========
I'm a pinephone user (ccu-sun50i-a64). I'm using U-Boot which sets the
pll-video0 to 294 MHz on boot. The phone's panel requires DCLK to run at
108 MHz to get a nice 60 Hz vertical refresh rate. The clock structure
is this:

    clock                       clock type
    --------------------------------------
    pll-video0                  ccu_nm
       pll-mipi                 ccu_nkm
          tcon0                 ccu_mux
             tcon-data-clock    sun4i_dclk

The divider between tcon0 and tcon-data-clock is fixed at 4. So, I need
pll-mipi to run at 432 MHz to get the desired vertical refresh rate.
When pll-vdeo0 is at 294 MHz this is that rate cannot be matched exactly
with any combination. The best we can get is 431.2 MHz (n=11, k=2,
m=15).

The pinephone has some "vendor" patches (megi kernel) that
 a. add HDMI
 b. allow re-setting pll-mipi's rate when pll-video0 changes

Re: Who needs a more precise clock?
===================================
When plugging in HDMI, pll-video's rate is set to 297 MHz, which - in
the vendor kernel, not mainline - triggers recalculation of pll-mipi
(trying to set it to 431.2 MHz). It ends up with a rate of 424.285714
MHz, because this is the nearest, but less than 431.2 MHz (n=5, k=2,
m=7). The nearest rate would be 432 MHz.

So, while analyzing the whole situation that I described above, I found
out that the NKM clocks are not set to the closest rate and wondered why
that is. Hence my request for comments.

Now, one could argue that pll-video0 should be set to 297MHz at boot or
that pll-mipi should try to set the *requested* rate instead of the
previous rate when the pll-video0 changes. And I think that both are
valid or even better approaches than my proposal in this RFC to address
this specific problem and I'll probably sent patches to discuss this as
well.


Re: Why speed up factor calculation?
====================================
I'm not aware that the current implementation of calculating n, k, and m
poses a bottleneck in any situation. Again, while going through the
code, I wondered why not save a few CPU cycles by precalculating the
meaningful combinations. In my opinion, it does not have any side
effects, so we might as well do it. (There is of course the side effect
of using a higher rate, but this is unrelated to precalculation as I
could as well employ a rate comparison that only allows lower rates, or
only optionally higher rates.)

> Clocks in general are very regression-prone, so I'd rather be a bit
> conservative there, and "if it ain't broke, don't fix it".

Sure, I get that.

As I stated in my cover letter:
"The motivation for these proposed changes lies in the current behavior
of rate selection for NKM clocks, which doesn't observe the
CLK_SET_RATE_PARENT flag. I.e. it does not select a different rate for
the parent clock to find the optimal rate."

I thought that this required this optimization to be implemented, but by
now, I'm no longer sure. I'll probably continue investigating different
paths for CLK_SET_RATE_PARENT for NKM clocks and follow up with new
findings.

Thanks,
  Frank

>
> Maxime
>
> [[End of PGP Signed Part]]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-01  5:16   ` Frank Oltmanns
@ 2023-06-01 19:41     ` Jernej Škrabec
  2023-06-02  7:34       ` Maxime Ripard
  2023-06-02  7:31     ` Maxime Ripard
  1 sibling, 1 reply; 19+ messages in thread
From: Jernej Škrabec @ 2023-06-01 19:41 UTC (permalink / raw)
  To: Maxime Ripard, Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Michael Turquette,
	Rob Herring, Samuel Holland, Stephen Boyd

Dne četrtek, 01. junij 2023 ob 07:16:45 CEST je Frank Oltmanns napisal(a):
> Hi Maxime,
> 
> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> > [[PGP Signed Part:Undecided]]
> > Hi Frank,
> >
> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
> >> I would like to bring your attention to the current process of setting
> >> the rate of an NKM clock. As it stands, when setting the rate of an
> >> NKM clock, the rate nearest but less than or equal to the requested
> >> rate is found, instead of the nearest rate.
> >
> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
> > for example. Some devices require that we don't overshoot, while some
> > prefer to have the closest rate.
> >
> > Both are fine, and it's a bit context specific which one we should
> > favour. If we were to do anything, it would be to support both and let
> > the clock driver select which behaviour it wants.
> >
> 
> Ok, understood. Thank you for the explanation! Now, I'm wondering if
> anyone would be using such a flag, if I added it.
> 
> >
> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
> >> when setting a rate, each time iterating over all combinations of n,
> >> k, and m.
> >
> > Yeah, that's expected as well.
> 
> I'm wondering though, if iterating over all combinations is set in
> stone, or if some kind of optimization would be in order.
> 
> >
> >> In response to this, I propose the following refinements to optimize the NKM
> >> clock setting:
> >>  a. when finding the best rate use the nearest rate, even if it is greater than
> >>     the requested rate (PATCH 1)
> >>  b. utilize binary search to find the best rate by going through a
> >>     precalculated, ordered list of all meaningful combinations of n, k, and m
> >>     (PATCH 2)
> >
> > One thing you haven't really addressed is why we would be doing this? Is
> > there some clocks that require a more precise clock and don't? Is the
> > factor calculation a bottleneck for some workloads?
> 
> Background
> ==========
> I'm a pinephone user (ccu-sun50i-a64). I'm using U-Boot which sets the
> pll-video0 to 294 MHz on boot. The phone's panel requires DCLK to run at
> 108 MHz to get a nice 60 Hz vertical refresh rate. The clock structure
> is this:
> 
>     clock                       clock type
>     --------------------------------------
>     pll-video0                  ccu_nm
>        pll-mipi                 ccu_nkm
>           tcon0                 ccu_mux
>              tcon-data-clock    sun4i_dclk
> 
> The divider between tcon0 and tcon-data-clock is fixed at 4. So, I need
> pll-mipi to run at 432 MHz to get the desired vertical refresh rate.
> When pll-vdeo0 is at 294 MHz this is that rate cannot be matched exactly
> with any combination. The best we can get is 431.2 MHz (n=11, k=2,
> m=15).
> 
> The pinephone has some "vendor" patches (megi kernel) that
>  a. add HDMI
>  b. allow re-setting pll-mipi's rate when pll-video0 changes
> 
> Re: Who needs a more precise clock?
> ===================================
> When plugging in HDMI, pll-video's rate is set to 297 MHz, which - in
> the vendor kernel, not mainline - triggers recalculation of pll-mipi
> (trying to set it to 431.2 MHz). It ends up with a rate of 424.285714
> MHz, because this is the nearest, but less than 431.2 MHz (n=5, k=2,
> m=7). The nearest rate would be 432 MHz.
> 
> So, while analyzing the whole situation that I described above, I found
> out that the NKM clocks are not set to the closest rate and wondered why
> that is. Hence my request for comments.
> 
> Now, one could argue that pll-video0 should be set to 297MHz at boot or
> that pll-mipi should try to set the *requested* rate instead of the
> previous rate when the pll-video0 changes. And I think that both are
> valid or even better approaches than my proposal in this RFC to address
> this specific problem and I'll probably sent patches to discuss this as
> well.

Ideally, clock rate setting code should be immune on "initial" values, set by
bootloader or default values. If it's not, then it should be improved in the
way that it is.

> 
> 
> Re: Why speed up factor calculation?
> ====================================
> I'm not aware that the current implementation of calculating n, k, and m
> poses a bottleneck in any situation. Again, while going through the
> code, I wondered why not save a few CPU cycles by precalculating the
> meaningful combinations. In my opinion, it does not have any side
> effects, so we might as well do it. (There is of course the side effect
> of using a higher rate, but this is unrelated to precalculation as I
> could as well employ a rate comparison that only allows lower rates, or
> only optionally higher rates.)
> 
> > Clocks in general are very regression-prone, so I'd rather be a bit
> > conservative there, and "if it ain't broke, don't fix it".
> 
> Sure, I get that.
> 
> As I stated in my cover letter:
> "The motivation for these proposed changes lies in the current behavior
> of rate selection for NKM clocks, which doesn't observe the
> CLK_SET_RATE_PARENT flag. I.e. it does not select a different rate for
> the parent clock to find the optimal rate."
> 
> I thought that this required this optimization to be implemented, but by
> now, I'm no longer sure. I'll probably continue investigating different
> paths for CLK_SET_RATE_PARENT for NKM clocks and follow up with new
> findings.

Let's leave out any optimizations that are not apparently needed. Most clock
rates are set only once at boot and others, like video clocks, not that often,
so a suboptimal code speed doesn't hurt currently.

Best regards,
Jernej

> 
> Thanks,
>   Frank
> 
> >
> > Maxime
> >
> > [[End of PGP Signed Part]]
> 





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-01  5:16   ` Frank Oltmanns
  2023-06-01 19:41     ` Jernej Škrabec
@ 2023-06-02  7:31     ` Maxime Ripard
  2023-06-05 10:31       ` Frank Oltmanns
  1 sibling, 1 reply; 19+ messages in thread
From: Maxime Ripard @ 2023-06-02  7:31 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

[-- Attachment #1: Type: text/plain, Size: 5842 bytes --]

Hi,

On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote:
> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> > [[PGP Signed Part:Undecided]]
> > Hi Frank,
> >
> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
> >> I would like to bring your attention to the current process of setting
> >> the rate of an NKM clock. As it stands, when setting the rate of an
> >> NKM clock, the rate nearest but less than or equal to the requested
> >> rate is found, instead of the nearest rate.
> >
> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
> > for example. Some devices require that we don't overshoot, while some
> > prefer to have the closest rate.
> >
> > Both are fine, and it's a bit context specific which one we should
> > favour. If we were to do anything, it would be to support both and let
> > the clock driver select which behaviour it wants.
> >
> 
> Ok, understood. Thank you for the explanation! Now, I'm wondering if
> anyone would be using such a flag, if I added it.

I guess that's another thing :) If no-one is going to use it, why should
we do it in the first place?

But most likely the display and audio clocks are usually fairly ok with
overshooting a bit, and a closest rate is usually better.

> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
> >> when setting a rate, each time iterating over all combinations of n,
> >> k, and m.
> >
> > Yeah, that's expected as well.
> 
> I'm wondering though, if iterating over all combinations is set in
> stone, or if some kind of optimization would be in order.

The thing with optimization is that you need to optimize for something.
So you need to show that this code is suboptimal (by whatever metric you
want to optimize for), and that your code is more optimal that it used
to be.

It means identifying a problem, writing benchmarks, and showing that the
new code performs better there.

For example, your code might very well be faster, but it will increase
the kernel image (and thus the RAM usage). One is not more optimal than
the other in absolute, they both are, using a different metric.

And then you have the more social factors that play a huge part too:
readibility, maintainability, etc.

> >> In response to this, I propose the following refinements to optimize the NKM
> >> clock setting:
> >>  a. when finding the best rate use the nearest rate, even if it is greater than
> >>     the requested rate (PATCH 1)
> >>  b. utilize binary search to find the best rate by going through a
> >>     precalculated, ordered list of all meaningful combinations of n, k, and m
> >>     (PATCH 2)
> >
> > One thing you haven't really addressed is why we would be doing this? Is
> > there some clocks that require a more precise clock and don't? Is the
> > factor calculation a bottleneck for some workloads?
> 
> Background
> ==========
> I'm a pinephone user (ccu-sun50i-a64). I'm using U-Boot which sets the
> pll-video0 to 294 MHz on boot. The phone's panel requires DCLK to run at
> 108 MHz to get a nice 60 Hz vertical refresh rate. The clock structure
> is this:
> 
>     clock                       clock type
>     --------------------------------------
>     pll-video0                  ccu_nm
>        pll-mipi                 ccu_nkm
>           tcon0                 ccu_mux
>              tcon-data-clock    sun4i_dclk
> 
> The divider between tcon0 and tcon-data-clock is fixed at 4. So, I need
> pll-mipi to run at 432 MHz to get the desired vertical refresh rate.
> When pll-vdeo0 is at 294 MHz this is that rate cannot be matched exactly
> with any combination. The best we can get is 431.2 MHz (n=11, k=2,
> m=15).
> 
> The pinephone has some "vendor" patches (megi kernel) that
>  a. add HDMI
>  b. allow re-setting pll-mipi's rate when pll-video0 changes
> 
> Re: Who needs a more precise clock?
> ===================================
> When plugging in HDMI, pll-video's rate is set to 297 MHz, which - in
> the vendor kernel, not mainline - triggers recalculation of pll-mipi
> (trying to set it to 431.2 MHz). It ends up with a rate of 424.285714
> MHz, because this is the nearest, but less than 431.2 MHz (n=5, k=2,
> m=7). The nearest rate would be 432 MHz.
> 
> So, while analyzing the whole situation that I described above, I found
> out that the NKM clocks are not set to the closest rate and wondered why
> that is. Hence my request for comments.

It all makes sense, but I'm not sure why it requires a complete rewrite
of the factor calculation algo?

> Now, one could argue that pll-video0 should be set to 297MHz at boot

Not really no, we should strive to be as immune as possible from the
boot state.

> or that pll-mipi should try to set the *requested* rate instead of the
> previous rate when the pll-video0 changes.

It's not clear to me what is the distinction you make here between the
requested rate and the previous rate?

Do you mean that you have two clk_set_rate in sequence, with the first
one on pll-mipi (or one of its child clocks), and the second one on
pll-video0 (but on a different sub-tree than pll-mipi) and thus pll-mipi
has its rate changed by the second, but doesn't match the previous rate
enforced?

If so, that's kind of expected: clk_set_rate doesn't provide any
warranty on for how long the rate is going to stay there. There's two
way to prevent that. Either we call clk_set_rate_exclusive (on the
first) instead, but it will effectively lock a clock subtree and prevent
any rate change which is a pretty big constraint. Or you add a notifier
to adjust to the parent clock rate change and still provide the same
output rate, with a different set of parameters.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-01 19:41     ` Jernej Škrabec
@ 2023-06-02  7:34       ` Maxime Ripard
  2023-06-05 11:41         ` Frank Oltmanns
  0 siblings, 1 reply; 19+ messages in thread
From: Maxime Ripard @ 2023-06-02  7:34 UTC (permalink / raw)
  To: Jernej Škrabec
  Cc: Frank Oltmanns, linux-arm-kernel, linux-clk, linux-kernel,
	linux-sunxi, Andre Przywara, Chen-Yu Tsai, Icenowy Zheng,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

[-- Attachment #1: Type: text/plain, Size: 2046 bytes --]

On Thu, Jun 01, 2023 at 09:41:30PM +0200, Jernej Škrabec wrote:
> Dne četrtek, 01. junij 2023 ob 07:16:45 CEST je Frank Oltmanns napisal(a):
> > Re: Why speed up factor calculation?
> > ====================================
> > I'm not aware that the current implementation of calculating n, k, and m
> > poses a bottleneck in any situation. Again, while going through the
> > code, I wondered why not save a few CPU cycles by precalculating the
> > meaningful combinations. In my opinion, it does not have any side
> > effects, so we might as well do it. (There is of course the side effect
> > of using a higher rate, but this is unrelated to precalculation as I
> > could as well employ a rate comparison that only allows lower rates, or
> > only optionally higher rates.)
> > 
> > > Clocks in general are very regression-prone, so I'd rather be a bit
> > > conservative there, and "if it ain't broke, don't fix it".
> > 
> > Sure, I get that.
> > 
> > As I stated in my cover letter:
> > "The motivation for these proposed changes lies in the current behavior
> > of rate selection for NKM clocks, which doesn't observe the
> > CLK_SET_RATE_PARENT flag. I.e. it does not select a different rate for
> > the parent clock to find the optimal rate."
> > 
> > I thought that this required this optimization to be implemented, but by
> > now, I'm no longer sure. I'll probably continue investigating different
> > paths for CLK_SET_RATE_PARENT for NKM clocks and follow up with new
> > findings.
> 
> Let's leave out any optimizations that are not apparently needed. Most clock
> rates are set only once at boot and others, like video clocks, not that often,
> so a suboptimal code speed doesn't hurt currently.

I'm not even sure we can make that assumption for video clocks. We might
for a panel, but for a more "dynamic" output like HDMI all bets are off
and depending on the monitor, the user settings and the userspace stack
we can definitely expect the video clock to change quite frequently.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-02  7:31     ` Maxime Ripard
@ 2023-06-05 10:31       ` Frank Oltmanns
  2023-06-06 14:02         ` Maxime Ripard
  0 siblings, 1 reply; 19+ messages in thread
From: Frank Oltmanns @ 2023-06-05 10:31 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

Hi Maxime,

On 2023-06-02 at 09:31:59 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> [[PGP Signed Part:Undecided]]
> Hi,
>
> On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote:
>> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@kernel.org> wrote:
>> > [[PGP Signed Part:Undecided]]
>> > Hi Frank,
>> >
>> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
>> >> I would like to bring your attention to the current process of setting
>> >> the rate of an NKM clock. As it stands, when setting the rate of an
>> >> NKM clock, the rate nearest but less than or equal to the requested
>> >> rate is found, instead of the nearest rate.
>> >
>> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
>> > for example. Some devices require that we don't overshoot, while some
>> > prefer to have the closest rate.
>> >
>> > Both are fine, and it's a bit context specific which one we should
>> > favour. If we were to do anything, it would be to support both and let
>> > the clock driver select which behaviour it wants.
>> >
>>
>> Ok, understood. Thank you for the explanation! Now, I'm wondering if
>> anyone would be using such a flag, if I added it.
>
> I guess that's another thing :) If no-one is going to use it, why should
> we do it in the first place?
>
> But most likely the display and audio clocks are usually fairly ok with
> overshooting a bit, and a closest rate is usually better.

Ok, I dived a bit deeper into this, but, as far as I can tell, the
closest rate is not used anywhere in the sunxi-ng ccu driver. So, when
extending, e.g., the NM or NKM clock to support, one must also extend at
least the mux clocks, because they expect rates less than the requested
rate. That seems to be quite the undertaking for only a small gain in
precision.

>> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
>> >> when setting a rate, each time iterating over all combinations of n,
>> >> k, and m.
>> >
>> > Yeah, that's expected as well.
>>
>> I'm wondering though, if iterating over all combinations is set in
>> stone, or if some kind of optimization would be in order.
>
> The thing with optimization is that you need to optimize for something.
> So you need to show that this code is suboptimal (by whatever metric you
> want to optimize for), and that your code is more optimal that it used
> to be.
>
> It means identifying a problem, writing benchmarks, and showing that the
> new code performs better there.
>
> For example, your code might very well be faster, but it will increase
> the kernel image (and thus the RAM usage). One is not more optimal than
> the other in absolute, they both are, using a different metric.

Sure, I get that. I'll submit a patchset that adds the functionality to
NKM clocks to set the rate of their parents.

With the new patchset, the time for, e.g. setting DCLK increases from
~0.5 ms to a whopping 30 - 37 ms. Those times were taken
unscientifically on my pinephone, i.e. kernel logging and a couple of
re-boots. But I think that still might give an idea of why I thought
about the need to increase performance.

The reason for this massive increase is, that the patch iterates over
all combinations of NKM for pll-mipi, and for each combination it
iterates over all combinations of NM for pll-video0.

Nevertheless, following your and Jernej's advice, I'll submit the
patchset first and then we can discuss if speed optimizations are needed
and what cost is acceptable.

> And then you have the more social factors that play a huge part too:
> readibility, maintainability, etc.
>
>> >> In response to this, I propose the following refinements to optimize the NKM
>> >> clock setting:
>> >>  a. when finding the best rate use the nearest rate, even if it is greater than
>> >>     the requested rate (PATCH 1)
>> >>  b. utilize binary search to find the best rate by going through a
>> >>     precalculated, ordered list of all meaningful combinations of n, k, and m
>> >>     (PATCH 2)
>> >
>> > One thing you haven't really addressed is why we would be doing this? Is
>> > there some clocks that require a more precise clock and don't? Is the
>> > factor calculation a bottleneck for some workloads?
>>
>> Background
>> ==========
>> I'm a pinephone user (ccu-sun50i-a64). I'm using U-Boot which sets the
>> pll-video0 to 294 MHz on boot. The phone's panel requires DCLK to run at
>> 108 MHz to get a nice 60 Hz vertical refresh rate. The clock structure
>> is this:
>>
>>     clock                       clock type
>>     --------------------------------------
>>     pll-video0                  ccu_nm
>>        pll-mipi                 ccu_nkm
>>           tcon0                 ccu_mux
>>              tcon-data-clock    sun4i_dclk
>>
>> The divider between tcon0 and tcon-data-clock is fixed at 4. So, I need
>> pll-mipi to run at 432 MHz to get the desired vertical refresh rate.
>> When pll-vdeo0 is at 294 MHz this is that rate cannot be matched exactly
>> with any combination. The best we can get is 431.2 MHz (n=11, k=2,
>> m=15).
>>
>> The pinephone has some "vendor" patches (megi kernel) that
>>  a. add HDMI
>>  b. allow re-setting pll-mipi's rate when pll-video0 changes
>>
>> Re: Who needs a more precise clock?
>> ===================================
>> When plugging in HDMI, pll-video's rate is set to 297 MHz, which - in
>> the vendor kernel, not mainline - triggers recalculation of pll-mipi
>> (trying to set it to 431.2 MHz). It ends up with a rate of 424.285714
>> MHz, because this is the nearest, but less than 431.2 MHz (n=5, k=2,
>> m=7). The nearest rate would be 432 MHz.
>>
>> So, while analyzing the whole situation that I described above, I found
>> out that the NKM clocks are not set to the closest rate and wondered why
>> that is. Hence my request for comments.
>
> It all makes sense, but I'm not sure why it requires a complete rewrite
> of the factor calculation algo?

You are absolutely right! It does not! Here, I was talking about the
reasons why a more precise clock might be desirable (PATCH 1), which
touches very few lines.

I tried to explain the reasons for the algo change further below in the
"Re: Why speed up factor calculation?" part of my mail that was cut off
in your response.

>
>> Now, one could argue that pll-video0 should be set to 297MHz at boot
>
> Not really no, we should strive to be as immune as possible from the
> boot state.

Agreed.

>> or that pll-mipi should try to set the *requested* rate instead of the
>> previous rate when the pll-video0 changes.
>
> It's not clear to me what is the distinction you make here between the
> requested rate and the previous rate?

This is quite a de-tour from the original discussion, so I'm sorry for
the confusion.

By requested rate I mean the rate that the user (DCLK) requested. But
this is not necessarily the rate that the clock is using in the end,
because of its parent's rate.

So, when the pll-video0 changes rate from 294 MHz to 297MHz (upon
plugging in HDMI), pll-mipi does not know any longer what the requested
rate (let's say 432MHz) was. It only knows the rate it had before the
rate change of pll-video0 (431.2MHz in that case). Now with the new
parent rate, it tries to find a new rate that's close to (but less than)
431.2 MHz instead of 432 MHz. And since with the new clock rate of 297
MHz for pll-video0, it cannot set to 431.2 MHz, so it rounds down again
(to 424.285714 MHz). That means it rounded down twice and is now quite
far from the originally requested 432 MHz.

So, whenever the parent rate changes, we always round down and move
further away from the rate that the user originally requested for our
clock. This could be mitigated by storing the requested rate (i.e. 432
MHz instead of 431.2 MHz). However, I don't know if that's possible. I'm
simply stating my observations. No call for action is implied in that
statement.

> Do you mean that you have two clk_set_rate in sequence, with the first
> one on pll-mipi (or one of its child clocks), and the second one on
> pll-video0 (but on a different sub-tree than pll-mipi) and thus pll-mipi
> has its rate changed by the second, but doesn't match the previous rate
> enforced?
>
> If so, that's kind of expected: clk_set_rate doesn't provide any
> warranty on for how long the rate is going to stay there. There's two
> way to prevent that. Either we call clk_set_rate_exclusive (on the
> first) instead, but it will effectively lock a clock subtree and prevent
> any rate change which is a pretty big constraint.

This is actually the current constraint in mainline. sun40i_tcon locks
pll-video0's tree.

> Or you add a notifier
> to adjust to the parent clock rate change and still provide the same
> output rate, with a different set of parameters.

This is what the "vendor" kernel (megi kernel) tries to do. It patches
the locking away and uses a notifier to react to pll-video0's rate
changes. With the caveat that it does not save the requested rate, which
I tried to explain above.

Anyhow, thank you for this detailed discussion! I really appreciate it!
Let me send my new patchset and see what everyone thinks.

Thanks,
  Frank

>
> Maxime
>
> [[End of PGP Signed Part]]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-02  7:34       ` Maxime Ripard
@ 2023-06-05 11:41         ` Frank Oltmanns
  0 siblings, 0 replies; 19+ messages in thread
From: Frank Oltmanns @ 2023-06-05 11:41 UTC (permalink / raw)
  To: Jernej Škrabec, Maxime Ripard
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Michael Turquette,
	Rob Herring, Samuel Holland, Stephen Boyd

Hi Jernej,
hi Maxime,

On 2023-06-02 at 09:34:03 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> [[PGP Signed Part:Undecided]]
> On Thu, Jun 01, 2023 at 09:41:30PM +0200, Jernej Škrabec wrote:
>> Dne četrtek, 01. junij 2023 ob 07:16:45 CEST je Frank Oltmanns napisal(a):
>> > Re: Why speed up factor calculation?
>> > ====================================
>> > I'm not aware that the current implementation of calculating n, k, and m
>> > poses a bottleneck in any situation. Again, while going through the
>> > code, I wondered why not save a few CPU cycles by precalculating the
>> > meaningful combinations. In my opinion, it does not have any side
>> > effects, so we might as well do it. (There is of course the side effect
>> > of using a higher rate, but this is unrelated to precalculation as I
>> > could as well employ a rate comparison that only allows lower rates, or
>> > only optionally higher rates.)
>> >
>> > > Clocks in general are very regression-prone, so I'd rather be a bit
>> > > conservative there, and "if it ain't broke, don't fix it".
>> >
>> > Sure, I get that.
>> >
>> > As I stated in my cover letter:
>> > "The motivation for these proposed changes lies in the current behavior
>> > of rate selection for NKM clocks, which doesn't observe the
>> > CLK_SET_RATE_PARENT flag. I.e. it does not select a different rate for
>> > the parent clock to find the optimal rate."
>> >
>> > I thought that this required this optimization to be implemented, but by
>> > now, I'm no longer sure. I'll probably continue investigating different
>> > paths for CLK_SET_RATE_PARENT for NKM clocks and follow up with new
>> > findings.
>>
>> Let's leave out any optimizations that are not apparently needed. Most clock
>> rates are set only once at boot and others, like video clocks, not that often,
>> so a suboptimal code speed doesn't hurt currently.
>
> I'm not even sure we can make that assumption for video clocks. We might
> for a panel, but for a more "dynamic" output like HDMI all bets are off
> and depending on the monitor, the user settings and the userspace stack
> we can definitely expect the video clock to change quite frequently.

Thank you both for your valuable feedback!

The goal I head in mind was adjusting pll-video0's clock when setting
DCLK on Allwinner A64. And you're both right, I got sidetracked by
premature optimizations.

As I wrote elsewhere in this thread, I will submit a patchset for the
original goal and we can discuss potential needs for optimization there.

Thanks,
  Frank

>
> Maxime
>
> [[End of PGP Signed Part]]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-05 10:31       ` Frank Oltmanns
@ 2023-06-06 14:02         ` Maxime Ripard
  2023-06-06 20:40           ` Frank Oltmanns
  0 siblings, 1 reply; 19+ messages in thread
From: Maxime Ripard @ 2023-06-06 14:02 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

[-- Attachment #1: Type: text/plain, Size: 4894 bytes --]

On Mon, Jun 05, 2023 at 12:31:41PM +0200, Frank Oltmanns wrote:
> Hi Maxime,
> 
> On 2023-06-02 at 09:31:59 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> > [[PGP Signed Part:Undecided]]
> > Hi,
> >
> > On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote:
> >> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> >> > [[PGP Signed Part:Undecided]]
> >> > Hi Frank,
> >> >
> >> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
> >> >> I would like to bring your attention to the current process of setting
> >> >> the rate of an NKM clock. As it stands, when setting the rate of an
> >> >> NKM clock, the rate nearest but less than or equal to the requested
> >> >> rate is found, instead of the nearest rate.
> >> >
> >> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
> >> > for example. Some devices require that we don't overshoot, while some
> >> > prefer to have the closest rate.
> >> >
> >> > Both are fine, and it's a bit context specific which one we should
> >> > favour. If we were to do anything, it would be to support both and let
> >> > the clock driver select which behaviour it wants.
> >> >
> >>
> >> Ok, understood. Thank you for the explanation! Now, I'm wondering if
> >> anyone would be using such a flag, if I added it.
> >
> > I guess that's another thing :) If no-one is going to use it, why should
> > we do it in the first place?
> >
> > But most likely the display and audio clocks are usually fairly ok with
> > overshooting a bit, and a closest rate is usually better.
> 
> Ok, I dived a bit deeper into this, but, as far as I can tell, the
> closest rate is not used anywhere in the sunxi-ng ccu driver. So, when
> extending, e.g., the NM or NKM clock to support, one must also extend at
> least the mux clocks, because they expect rates less than the requested
> rate. That seems to be quite the undertaking for only a small gain in
> precision.

mux clocks are using __clk_mux_determine_rate which should have the
behaviour you want when CLK_MUX_ROUND_CLOSEST is set.

> >> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
> >> >> when setting a rate, each time iterating over all combinations of n,
> >> >> k, and m.
> >> >
> >> > Yeah, that's expected as well.
> >>
> >> I'm wondering though, if iterating over all combinations is set in
> >> stone, or if some kind of optimization would be in order.
> >
> > The thing with optimization is that you need to optimize for something.
> > So you need to show that this code is suboptimal (by whatever metric you
> > want to optimize for), and that your code is more optimal that it used
> > to be.
> >
> > It means identifying a problem, writing benchmarks, and showing that the
> > new code performs better there.
> >
> > For example, your code might very well be faster, but it will increase
> > the kernel image (and thus the RAM usage). One is not more optimal than
> > the other in absolute, they both are, using a different metric.
> 
> Sure, I get that. I'll submit a patchset that adds the functionality to
> NKM clocks to set the rate of their parents.
> 
> With the new patchset, the time for, e.g. setting DCLK increases from
> ~0.5 ms to a whopping 30 - 37 ms. Those times were taken
> unscientifically on my pinephone, i.e. kernel logging and a couple of
> re-boots. But I think that still might give an idea of why I thought
> about the need to increase performance.
>
> The reason for this massive increase is, that the patch iterates over
> all combinations of NKM for pll-mipi, and for each combination it
> iterates over all combinations of NM for pll-video0.
> 
> Nevertheless, following your and Jernej's advice, I'll submit the
> patchset first and then we can discuss if speed optimizations are needed
> and what cost is acceptable.

Honestly, for 40ms, it will be a hard sell :)

> >> or that pll-mipi should try to set the *requested* rate instead of the
> >> previous rate when the pll-video0 changes.
> >
> > It's not clear to me what is the distinction you make here between the
> > requested rate and the previous rate?
> 
> This is quite a de-tour from the original discussion, so I'm sorry for
> the confusion.
> 
> By requested rate I mean the rate that the user (DCLK) requested. But
> this is not necessarily the rate that the clock is using in the end,
> because of its parent's rate.
> 
> So, when the pll-video0 changes rate from 294 MHz to 297MHz (upon
> plugging in HDMI), pll-mipi does not know any longer what the requested
> rate (let's say 432MHz) was.

It does, it's struct clk_core's req_rate. It doesn't look like it's
available to clk_hw users, but given the rest of your explanation, I
guess you have a compelling use case to make it available.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-06 14:02         ` Maxime Ripard
@ 2023-06-06 20:40           ` Frank Oltmanns
  2023-06-07 11:42             ` Maxime Ripard
  0 siblings, 1 reply; 19+ messages in thread
From: Frank Oltmanns @ 2023-06-06 20:40 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

Hi Maxime,

On 2023-06-06 at 16:02:58 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> [[PGP Signed Part:Undecided]]
> On Mon, Jun 05, 2023 at 12:31:41PM +0200, Frank Oltmanns wrote:
>> Hi Maxime,
>>
>> On 2023-06-02 at 09:31:59 +0200, Maxime Ripard <mripard@kernel.org> wrote:
>> > [[PGP Signed Part:Undecided]]
>> > Hi,
>> >
>> > On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote:
>> >> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@kernel.org> wrote:
>> >> > [[PGP Signed Part:Undecided]]
>> >> > Hi Frank,
>> >> >
>> >> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
>> >> >> I would like to bring your attention to the current process of setting
>> >> >> the rate of an NKM clock. As it stands, when setting the rate of an
>> >> >> NKM clock, the rate nearest but less than or equal to the requested
>> >> >> rate is found, instead of the nearest rate.
>> >> >
>> >> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
>> >> > for example. Some devices require that we don't overshoot, while some
>> >> > prefer to have the closest rate.
>> >> >
>> >> > Both are fine, and it's a bit context specific which one we should
>> >> > favour. If we were to do anything, it would be to support both and let
>> >> > the clock driver select which behaviour it wants.
>> >> >
>> >>
>> >> Ok, understood. Thank you for the explanation! Now, I'm wondering if
>> >> anyone would be using such a flag, if I added it.
>> >
>> > I guess that's another thing :) If no-one is going to use it, why should
>> > we do it in the first place?
>> >
>> > But most likely the display and audio clocks are usually fairly ok with
>> > overshooting a bit, and a closest rate is usually better.
>>
>> Ok, I dived a bit deeper into this, but, as far as I can tell, the
>> closest rate is not used anywhere in the sunxi-ng ccu driver. So, when
>> extending, e.g., the NM or NKM clock to support, one must also extend at
>> least the mux clocks, because they expect rates less than the requested
>> rate. That seems to be quite the undertaking for only a small gain in
>> precision.
>
> mux clocks are using __clk_mux_determine_rate which should have the
> behaviour you want when CLK_MUX_ROUND_CLOSEST is set.

https://elixir.bootlin.com/linux/v6.3.6/source/drivers/clk/sunxi-ng/ccu-sun50i-a64.c#L539
is one example of a mux clock combined with a divider that is a bit more
complex. I didn't look too deeply, but it seemed to me, that it would
require two separate flags: One for the mux component and one for the
div component. Maybe I'm mistaken, but it seems to me that the concept
of having selected rates always be equal to or less than requested
rates, seems to be deeply ingrained in the sunxi-ng driver. I'm afraid
that I might miss some parts, therefore I abandoned that idea for now
(especially since I have only one board for testing).

>> >> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
>> >> >> when setting a rate, each time iterating over all combinations of n,
>> >> >> k, and m.
>> >> >
>> >> > Yeah, that's expected as well.
>> >>
>> >> I'm wondering though, if iterating over all combinations is set in
>> >> stone, or if some kind of optimization would be in order.
>> >
>> > The thing with optimization is that you need to optimize for something.
>> > So you need to show that this code is suboptimal (by whatever metric you
>> > want to optimize for), and that your code is more optimal that it used
>> > to be.
>> >
>> > It means identifying a problem, writing benchmarks, and showing that the
>> > new code performs better there.
>> >
>> > For example, your code might very well be faster, but it will increase
>> > the kernel image (and thus the RAM usage). One is not more optimal than
>> > the other in absolute, they both are, using a different metric.
>>
>> Sure, I get that. I'll submit a patchset that adds the functionality to
>> NKM clocks to set the rate of their parents.
>>
>> With the new patchset, the time for, e.g. setting DCLK increases from
>> ~0.5 ms to a whopping 30 - 37 ms. Those times were taken
>> unscientifically on my pinephone, i.e. kernel logging and a couple of
>> re-boots. But I think that still might give an idea of why I thought
>> about the need to increase performance.
>>
>> The reason for this massive increase is, that the patch iterates over
>> all combinations of NKM for pll-mipi, and for each combination it
>> iterates over all combinations of NM for pll-video0.
>>
>> Nevertheless, following your and Jernej's advice, I'll submit the
>> patchset first and then we can discuss if speed optimizations are needed
>> and what cost is acceptable.
>
> Honestly, for 40ms, it will be a hard sell :)

I'm not sure what part you think is the "hard-sell":
 a. the patch itself because 30 to 40 ms is way too much
 b. the optimization, because 30 to 40 ms isn't all that much.
I honestly don't know.

BTW, this is the patchset in case you missed it:
https://lore.kernel.org/lkml/20230605190745.366882-1-frank@oltmanns.dev/

Note, that I have a patch in the works, which is similar to the one in
this thread, but for ccu_nm. Doing a binary search for finding the
parent rate of pll-mipi, i.e., pll-video0, reduces the time from ~30 ms
to less than 2 ms. If combined with only iterating through meaningful
nkm combinations for pll-mipi, this should bring the time under 1 ms
again.

>
>> >> or that pll-mipi should try to set the *requested* rate instead of the
>> >> previous rate when the pll-video0 changes.
>> >
>> > It's not clear to me what is the distinction you make here between the
>> > requested rate and the previous rate?
>>
>> This is quite a de-tour from the original discussion, so I'm sorry for
>> the confusion.
>>
>> By requested rate I mean the rate that the user (DCLK) requested. But
>> this is not necessarily the rate that the clock is using in the end,
>> because of its parent's rate.
>>
>> So, when the pll-video0 changes rate from 294 MHz to 297MHz (upon
>> plugging in HDMI), pll-mipi does not know any longer what the requested
>> rate (let's say 432MHz) was.
>
> It does, it's struct clk_core's req_rate. It doesn't look like it's
> available to clk_hw users, but given the rest of your explanation, I
> guess you have a compelling use case to make it available.

Oh, thank you for making me aware of that! I'll surely look into it.

Thanks,
  Frank

>
> Maxime
>
> [[End of PGP Signed Part]]

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks
  2023-06-06 20:40           ` Frank Oltmanns
@ 2023-06-07 11:42             ` Maxime Ripard
  0 siblings, 0 replies; 19+ messages in thread
From: Maxime Ripard @ 2023-06-07 11:42 UTC (permalink / raw)
  To: Frank Oltmanns
  Cc: linux-arm-kernel, linux-clk, linux-kernel, linux-sunxi,
	Andre Przywara, Chen-Yu Tsai, Icenowy Zheng, Jernej Skrabec,
	Michael Turquette, Rob Herring, Samuel Holland, Stephen Boyd

[-- Attachment #1: Type: text/plain, Size: 6371 bytes --]

On Tue, Jun 06, 2023 at 10:40:34PM +0200, Frank Oltmanns wrote:
> Hi Maxime,
> 
> On 2023-06-06 at 16:02:58 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> > [[PGP Signed Part:Undecided]]
> > On Mon, Jun 05, 2023 at 12:31:41PM +0200, Frank Oltmanns wrote:
> >> Hi Maxime,
> >>
> >> On 2023-06-02 at 09:31:59 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> >> > [[PGP Signed Part:Undecided]]
> >> > Hi,
> >> >
> >> > On Thu, Jun 01, 2023 at 07:16:45AM +0200, Frank Oltmanns wrote:
> >> >> On 2023-05-31 at 15:48:43 +0200, Maxime Ripard <mripard@kernel.org> wrote:
> >> >> > [[PGP Signed Part:Undecided]]
> >> >> > Hi Frank,
> >> >> >
> >> >> > On Sat, May 27, 2023 at 03:27:44PM +0200, Frank Oltmanns wrote:
> >> >> >> I would like to bring your attention to the current process of setting
> >> >> >> the rate of an NKM clock. As it stands, when setting the rate of an
> >> >> >> NKM clock, the rate nearest but less than or equal to the requested
> >> >> >> rate is found, instead of the nearest rate.
> >> >> >
> >> >> > Yeah, it's actually pretty common, see clk_mux_determine_rate_flags()
> >> >> > for example. Some devices require that we don't overshoot, while some
> >> >> > prefer to have the closest rate.
> >> >> >
> >> >> > Both are fine, and it's a bit context specific which one we should
> >> >> > favour. If we were to do anything, it would be to support both and let
> >> >> > the clock driver select which behaviour it wants.
> >> >> >
> >> >>
> >> >> Ok, understood. Thank you for the explanation! Now, I'm wondering if
> >> >> anyone would be using such a flag, if I added it.
> >> >
> >> > I guess that's another thing :) If no-one is going to use it, why should
> >> > we do it in the first place?
> >> >
> >> > But most likely the display and audio clocks are usually fairly ok with
> >> > overshooting a bit, and a closest rate is usually better.
> >>
> >> Ok, I dived a bit deeper into this, but, as far as I can tell, the
> >> closest rate is not used anywhere in the sunxi-ng ccu driver. So, when
> >> extending, e.g., the NM or NKM clock to support, one must also extend at
> >> least the mux clocks, because they expect rates less than the requested
> >> rate. That seems to be quite the undertaking for only a small gain in
> >> precision.
> >
> > mux clocks are using __clk_mux_determine_rate which should have the
> > behaviour you want when CLK_MUX_ROUND_CLOSEST is set.
> 
> https://elixir.bootlin.com/linux/v6.3.6/source/drivers/clk/sunxi-ng/ccu-sun50i-a64.c#L539
> is one example of a mux clock combined with a divider that is a bit more
> complex. I didn't look too deeply, but it seemed to me, that it would
> require two separate flags: One for the mux component and one for the
> div component. Maybe I'm mistaken, but it seems to me that the concept
> of having selected rates always be equal to or less than requested
> rates, seems to be deeply ingrained in the sunxi-ng driver. I'm afraid
> that I might miss some parts, therefore I abandoned that idea for now
> (especially since I have only one board for testing).

All the implementations (afaik) of the "composite" clocks we have that
combine a mux and dividers eventually use ccu_mux_helper_determine_rate
and divider_round_rate_parent. The last one can use
CLK_DIVIDER_ROUND_CLOSEST, and you covered the first one in your patch.
Which one did you leave out?

> >> >> >> Moreover, ccu_nkm_find_best() is called multiple times (footnote [1])
> >> >> >> when setting a rate, each time iterating over all combinations of n,
> >> >> >> k, and m.
> >> >> >
> >> >> > Yeah, that's expected as well.
> >> >>
> >> >> I'm wondering though, if iterating over all combinations is set in
> >> >> stone, or if some kind of optimization would be in order.
> >> >
> >> > The thing with optimization is that you need to optimize for something.
> >> > So you need to show that this code is suboptimal (by whatever metric you
> >> > want to optimize for), and that your code is more optimal that it used
> >> > to be.
> >> >
> >> > It means identifying a problem, writing benchmarks, and showing that the
> >> > new code performs better there.
> >> >
> >> > For example, your code might very well be faster, but it will increase
> >> > the kernel image (and thus the RAM usage). One is not more optimal than
> >> > the other in absolute, they both are, using a different metric.
> >>
> >> Sure, I get that. I'll submit a patchset that adds the functionality to
> >> NKM clocks to set the rate of their parents.
> >>
> >> With the new patchset, the time for, e.g. setting DCLK increases from
> >> ~0.5 ms to a whopping 30 - 37 ms. Those times were taken
> >> unscientifically on my pinephone, i.e. kernel logging and a couple of
> >> re-boots. But I think that still might give an idea of why I thought
> >> about the need to increase performance.
> >>
> >> The reason for this massive increase is, that the patch iterates over
> >> all combinations of NKM for pll-mipi, and for each combination it
> >> iterates over all combinations of NM for pll-video0.
> >>
> >> Nevertheless, following your and Jernej's advice, I'll submit the
> >> patchset first and then we can discuss if speed optimizations are needed
> >> and what cost is acceptable.
> >
> > Honestly, for 40ms, it will be a hard sell :)
> 
> I'm not sure what part you think is the "hard-sell":
>  a. the patch itself because 30 to 40 ms is way too much
>  b. the optimization, because 30 to 40 ms isn't all that much.
> I honestly don't know.
> 
> BTW, this is the patchset in case you missed it:
> https://lore.kernel.org/lkml/20230605190745.366882-1-frank@oltmanns.dev/
> 
> Note, that I have a patch in the works, which is similar to the one in
> this thread, but for ccu_nm. Doing a binary search for finding the
> parent rate of pll-mipi, i.e., pll-video0, reduces the time from ~30 ms
> to less than 2 ms. If combined with only iterating through meaningful
> nkm combinations for pll-mipi, this should bring the time under 1 ms
> again.

My point is that:

 * it's regression prone, so quite risky

 * but if the endgoal is to gain 40ms once, at boot, without any use
   case to back that up, there's basically no reward.

a high risk, low reward patch is what's hard to sell.

Maxime

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 228 bytes --]

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2023-06-07 11:42 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-27 13:27 [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Frank Oltmanns
2023-05-27 13:27 ` [RFC PATCH 1/3] clk: sunxi-ng: nkm: Minimize difference when finding rate Frank Oltmanns
2023-05-27 13:27 ` [RFC PATCH 2/3] clk: sunxi-ng: Implement precalculated NKM rate selection Frank Oltmanns
2023-05-27 23:19   ` Julian Calaby
2023-05-28  9:12     ` Frank Oltmanns
2023-05-28 15:32       ` Julian Calaby
2023-05-28 17:12         ` Frank Oltmanns
2023-05-28 14:11   ` Frank Oltmanns
2023-05-27 13:27 ` [RFC PATCH 3/3] clk: sunxi-ng: sun50i-a64: Precalculate NKM combinations for pll-mipi Frank Oltmanns
2023-05-31 13:48 ` [RFC PATCH 0/3] clk: sunxi-ng: Optimize rate selection for NKM clocks Maxime Ripard
2023-06-01  5:16   ` Frank Oltmanns
2023-06-01 19:41     ` Jernej Škrabec
2023-06-02  7:34       ` Maxime Ripard
2023-06-05 11:41         ` Frank Oltmanns
2023-06-02  7:31     ` Maxime Ripard
2023-06-05 10:31       ` Frank Oltmanns
2023-06-06 14:02         ` Maxime Ripard
2023-06-06 20:40           ` Frank Oltmanns
2023-06-07 11:42             ` Maxime Ripard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).