All of lore.kernel.org
 help / color / mirror / Atom feed
* why does aptina_pll_calculate insist on exact division?
@ 2018-08-14  6:35 Helmut Grohne
  2018-08-14  7:30 ` Laurent Pinchart
  0 siblings, 1 reply; 11+ messages in thread
From: Helmut Grohne @ 2018-08-14  6:35 UTC (permalink / raw)
  To: Laurent Pinchart, linux-media

Hi,

I tried using the aptina_pll_calculate for a "new" imager and ran into
problems. After filling out aptina_pll_limits from the data sheet, I was
having a hard time finding a valid pix_clock. Most of the ones I tried
are rejected by aptina_pll_calculate for various reasons. In particular,
no pix_clock close to pix_clock_max is allowed.

Why does the calculation method insist on exact division and avoiding
fractional numbers?

I'm using an ext_clock of 50 MHz. This clock is derived from a 33 MHz
clock and the 50 MHz is not attained exactly. Rather it ends up being
more like 49.999976 Hz. This raises the question, what value I should
put into ext_clock (or the corresponding device tree property). Should I
use the requested frequency or the actual frequency? Worse, depending on
the precise value of the ext_clock, aptina_pll_calculate may or may not
be able to compute pll parameters.

On the other hand, the manufacturer provided configuration tool happily
computes pll parameters that result in close, but not exactly, the
requested pix_clock. In particular, the pll parameters do not
necessarily result in a whole number. It appears to merely approximate
the requested frequency.

Can you explain where the requirement to avoid fractional numbers comes
from? Would it be reasonable to use a different algorithm that avoids
this requirement?

Helmut

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: why does aptina_pll_calculate insist on exact division?
  2018-08-14  6:35 why does aptina_pll_calculate insist on exact division? Helmut Grohne
@ 2018-08-14  7:30 ` Laurent Pinchart
  2018-08-14  8:40   ` Helmut Grohne
  2018-08-14  8:48   ` why does aptina_pll_calculate insist on exact division? Sakari Ailus
  0 siblings, 2 replies; 11+ messages in thread
From: Laurent Pinchart @ 2018-08-14  7:30 UTC (permalink / raw)
  To: Helmut Grohne; +Cc: linux-media, sakari.ailus

Hi Helmut,

(CC'ing Sakari Ailus who is our current PLL expert after spending so much time 
on the SMIA PLL code)

On Tuesday, 14 August 2018 09:35:40 EEST Helmut Grohne wrote:
> Hi,
> 
> I tried using the aptina_pll_calculate for a "new" imager and ran into
> problems. After filling out aptina_pll_limits from the data sheet, I was
> having a hard time finding a valid pix_clock. Most of the ones I tried
> are rejected by aptina_pll_calculate for various reasons. In particular,
> no pix_clock close to pix_clock_max is allowed.

How do you mean ? The only place where pix_clock_max is used is in the 
following code:

        if (pll->pix_clock == 0 || pll->pix_clock > limits->pix_clock_max) {
                dev_err(dev, "pll: invalid pixel clock frequency.\n");
                return -EINVAL;
        }

aptina_pll_calculate() rejects a request pixel clock value higher than the 
pixel clock frequency higher limit, which is also given by the caller. That 
shouldn't happen, it would be a bug in the caller.

> Why does the calculation method insist on exact division and avoiding
> fractional numbers?

I'm not sure what you mean by avoiding fractional numbers. Could you please 
elaborate ? Please keep in mind that I last touched the code 6 years, so my 
memory isn't exactly fresh.

If you mean using floating point operations to calculate PLL parameters, 
remember that the code runs in the Linux kernel, where floating point isn't 
allowed. You would thus have to implement the algorithm using fixed-point 
calculation.

> I'm using an ext_clock of 50 MHz. This clock is derived from a 33 MHz
> clock and the 50 MHz is not attained exactly. Rather it ends up being
> more like 49.999976 Hz. This raises the question, what value I should
> put into ext_clock (or the corresponding device tree property). Should I
> use the requested frequency or the actual frequency?

There's no such thing as an exact frequency anyway, as it's a physical value. 
I'd got for 50 MHz for simplicity.

> Worse, depending on the precise value of the ext_clock, aptina_pll_calculate
> may or may not be able to compute pll parameters.
> 
> On the other hand, the manufacturer provided configuration tool happily
> computes pll parameters that result in close, but not exactly, the
> requested pix_clock. In particular, the pll parameters do not
> necessarily result in a whole number. It appears to merely approximate
> the requested frequency.

aptina_pll_calculate() also approximates the requested frequency, but as it 
appears from your test, fails in some cases. That's certainly an issue in the 
code and should be fixed. Feel free to convince the manufacturer to release 
their PLL parameters computation code under the GPL ;-)

> Can you explain where the requirement to avoid fractional numbers comes
> from? Would it be reasonable to use a different algorithm that avoids
> this requirement?

Again, please elaborate on what you mean by avoiding fractional numbers. I 
would certainly be open to a different algorithm (or possibly fixes to the 
existing code), as long as it fulfills the requirements behind the current 
implementation. In particular the code should compute the optimum PLL 
parameters when multiple options are possible, and its complexity should be 
lower than O(n^2) (ideally O(1), if not possible O(n)).

-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: why does aptina_pll_calculate insist on exact division?
  2018-08-14  7:30 ` Laurent Pinchart
@ 2018-08-14  8:40   ` Helmut Grohne
  2018-08-23  7:52     ` [PATCH] media: aptina-pll: allow approximating the requested pix_clock Helmut Grohne
  2018-08-14  8:48   ` why does aptina_pll_calculate insist on exact division? Sakari Ailus
  1 sibling, 1 reply; 11+ messages in thread
From: Helmut Grohne @ 2018-08-14  8:40 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: linux-media, sakari.ailus

Hi,

Thank you for the quick and helpful answer.

On Tue, Aug 14, 2018 at 09:30:14AM +0200, Laurent Pinchart wrote:
> How do you mean ? The only place where pix_clock_max is used is in the 
> following code:
> 
>         if (pll->pix_clock == 0 || pll->pix_clock > limits->pix_clock_max) {
>                 dev_err(dev, "pll: invalid pixel clock frequency.\n");
>                 return -EINVAL;
>         }
> 
> aptina_pll_calculate() rejects a request pixel clock value higher than the 
> pixel clock frequency higher limit, which is also given by the caller. That 
> shouldn't happen, it would be a bug in the caller.

Of course, I am trying values lower than pix_clock_max. For a number of
imagers, that pix_clock_max is 74.25 MHz. It seems that any pix_clock of
at least 72 MHz is rejected here.

> I'm not sure what you mean by avoiding fractional numbers. Could you please 
> elaborate ? Please keep in mind that I last touched the code 6 years, so my 
> memory isn't exactly fresh.

The first thing the code does is computing the gcd of pix_clock and
ext_clock. Immediately, it conludes that m must be a multiple of
pix_clock / gcd(pix_clock, ext_clock). Varying either clock value
slightly causes large jumps in the computed gcd value (in particular, it
will be 1 whenever either clock happens to be a prime number).

> If you mean using floating point operations to calculate PLL parameters, 
> remember that the code runs in the Linux kernel, where floating point isn't 
> allowed. You would thus have to implement the algorithm using fixed-point 
> calculation.

I'm not after using floating points. In a sense, we are already
fixed-point calculation and the precision is 1 Hz. Rounding errors in
that range look ok to me.

> There's no such thing as an exact frequency anyway, as it's a physical value. 
> I'd got for 50 MHz for simplicity.

That's exactly my point. The exact value should not matter. However, for
the present aptina_pll_calculate, the exact value matters a lot. Were
you to use 49999991 Hz as ext_clock (which happens to be prime, but
reasonably close), aptina_pll_calculate fails entirely as m is deemed to
be a multiple of the pix_clock in Hz. No imager allows for such large m
values and thus aptina_pll_calculate rejects any such configuration with
-EINVAL.

I'm arguing that rather than failing to compute pll parameters, it
should compromise on exactness. Presently, aptina_pll_calculate ensures
that whenever it is successful, the assertion pix_clock = ext_clock / n
* m / p1 holds exactly and all intermediate values are whole numbers.
I'm arguing that having it hold exactly reduces utility of
aptina_pll_calculate, because frequencies are not exact in the real
world. There is no need to have whole numbered frequencies.

> aptina_pll_calculate() also approximates the requested frequency, but as it 
> appears from your test, fails in some cases. That's certainly an issue in the 
> code and should be fixed. Feel free to convince the manufacturer to release 
> their PLL parameters computation code under the GPL ;-)

We both know that the exercise of extracting code from manufacturers is
futile.

However you appear to imply that aptina_pll_calculate should approximate
the requested frequency. That's not what it does today. That's a useful
answer to me already and I'll be looking into doing the work of coming
up with an alternative lifting the requirement.

> Again, please elaborate on what you mean by avoiding fractional numbers. I 
> would certainly be open to a different algorithm (or possibly fixes to the 
> existing code), as long as it fulfills the requirements behind the current 
> implementation. In particular the code should compute the optimum PLL 
> parameters when multiple options are possible, and its complexity should be 
> lower than O(n^2) (ideally O(1), if not possible O(n)).

Beware though that discussing complexities requires more precision as to
what "n" means here. The code interprets it as n = p1_max - p1_min (not
accounting for the gcd computation), which is not the usual
interpretation. What you really want is that it completes in a
reasonable amount of time on slow, embedded devices for any input.

Once you lift the exactness requirement, you optimize multiple aspects
simultaneously. The present code maximizes P1, but we also want to
minimize the difference between the requested pix_clock and the
resulting pix_clock. There has to be some kind of trade-off here. The
trade-off chosen by the present code is to always have that difference
be 0. Once non-zero differences are allowed, optimum is no longer
well-defined.

So could you go into more detail as to what "optimum PLL parameters"
mean to you?

Helmut

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: why does aptina_pll_calculate insist on exact division?
  2018-08-14  7:30 ` Laurent Pinchart
  2018-08-14  8:40   ` Helmut Grohne
@ 2018-08-14  8:48   ` Sakari Ailus
  1 sibling, 0 replies; 11+ messages in thread
From: Sakari Ailus @ 2018-08-14  8:48 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Helmut Grohne, linux-media

Hi,

On Tue, Aug 14, 2018 at 10:30:14AM +0300, Laurent Pinchart wrote:
> Hi Helmut,
> 
> (CC'ing Sakari Ailus who is our current PLL expert after spending so much time 
> on the SMIA PLL code)
> 
> On Tuesday, 14 August 2018 09:35:40 EEST Helmut Grohne wrote:
> > Hi,
> > 
> > I tried using the aptina_pll_calculate for a "new" imager and ran into
> > problems. After filling out aptina_pll_limits from the data sheet, I was
> > having a hard time finding a valid pix_clock. Most of the ones I tried
> > are rejected by aptina_pll_calculate for various reasons. In particular,
> > no pix_clock close to pix_clock_max is allowed.

On many systems where such sensors are being used, you generally need to
use known-good link frequencies that have no EMI issues. Therefore the PLL
calculator uses this input, as well as the bits per pixel etc. platform
specific inputs and hardware limits to come up with the pixel rate. At
least for smiapp PLL calculator this is the highest possible rate, taking
the constraints into account.

> 
> How do you mean ? The only place where pix_clock_max is used is in the 
> following code:
> 
>         if (pll->pix_clock == 0 || pll->pix_clock > limits->pix_clock_max) {
>                 dev_err(dev, "pll: invalid pixel clock frequency.\n");
>                 return -EINVAL;
>         }
> 
> aptina_pll_calculate() rejects a request pixel clock value higher than the 
> pixel clock frequency higher limit, which is also given by the caller. That 
> shouldn't happen, it would be a bug in the caller.
> 
> > Why does the calculation method insist on exact division and avoiding
> > fractional numbers?
> 
> I'm not sure what you mean by avoiding fractional numbers. Could you please 
> elaborate ? Please keep in mind that I last touched the code 6 years, so my 
> memory isn't exactly fresh.
> 
> If you mean using floating point operations to calculate PLL parameters, 
> remember that the code runs in the Linux kernel, where floating point isn't 
> allowed. You would thus have to implement the algorithm using fixed-point 
> calculation.
> 
> > I'm using an ext_clock of 50 MHz. This clock is derived from a 33 MHz
> > clock and the 50 MHz is not attained exactly. Rather it ends up being
> > more like 49.999976 Hz. This raises the question, what value I should
> > put into ext_clock (or the corresponding device tree property). Should I
> > use the requested frequency or the actual frequency?
> 
> There's no such thing as an exact frequency anyway, as it's a physical value. 
> I'd got for 50 MHz for simplicity.
> 
> > Worse, depending on the precise value of the ext_clock, aptina_pll_calculate
> > may or may not be able to compute pll parameters.
> > 
> > On the other hand, the manufacturer provided configuration tool happily
> > computes pll parameters that result in close, but not exactly, the
> > requested pix_clock. In particular, the pll parameters do not
> > necessarily result in a whole number. It appears to merely approximate
> > the requested frequency.
> 
> aptina_pll_calculate() also approximates the requested frequency, but as it 
> appears from your test, fails in some cases. That's certainly an issue in the 
> code and should be fixed. Feel free to convince the manufacturer to release 
> their PLL parameters computation code under the GPL ;-)
> 
> > Can you explain where the requirement to avoid fractional numbers comes
> > from? Would it be reasonable to use a different algorithm that avoids
> > this requirement?
> 
> Again, please elaborate on what you mean by avoiding fractional numbers. I 
> would certainly be open to a different algorithm (or possibly fixes to the 
> existing code), as long as it fulfills the requirements behind the current 
> implementation. In particular the code should compute the optimum PLL 
> parameters when multiple options are possible, and its complexity should be 
> lower than O(n^2) (ideally O(1), if not possible O(n)).
> 
> -- 
> Regards,
> 
> Laurent Pinchart
> 
> 
> 

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH] media: aptina-pll: allow approximating the requested pix_clock
  2018-08-14  8:40   ` Helmut Grohne
@ 2018-08-23  7:52     ` Helmut Grohne
  2018-08-23 11:12       ` Laurent Pinchart
  2018-08-23 11:30       ` Sakari Ailus
  0 siblings, 2 replies; 11+ messages in thread
From: Helmut Grohne @ 2018-08-23  7:52 UTC (permalink / raw)
  To: Laurent Pinchart, Mauro Carvalho Chehab, linux-media, Sakari Ailus

Clock frequencies are not exact values, but rather imprecise, physical
properties. The present pll computation however, treats them as exact.
It tries to compute parameters that attain the requested pix_clock
exactly. Failing that, it gives up.

The new implementation approximates the requested pix_clock and reports
the actual value resulting from the computed parameters. If the
requested pix_clock can be attained, the original behaviour of
maximizing p1 is retained.

Experiments with the algorithm in userspace on an arm device show that
the computational cost is negligible compared to executing a binary for
all practical inputs.

Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
---
 drivers/media/i2c/aptina-pll.c | 263 ++++++++++++++++++++++++-----------------
 drivers/media/i2c/mt9m032.c    |  15 ++-
 2 files changed, 165 insertions(+), 113 deletions(-)

I tried using aptina_pll_calculate in a driver for a similar image sensor.
Using it, I was unable to find a pix_clock value such that the computation was
successful. Most of the practical parameters result in a non-integer pix_clock,
something that struct pll cannot represent.

The driver is not ready for submission at this point, but the changes to
aptina_pll_calculate are reasonable on their own, which is why I submit them
separately now.

Helmut

diff --git a/drivers/media/i2c/aptina-pll.c b/drivers/media/i2c/aptina-pll.c
index 224ae4e4cf8b..249018772b2b 100644
--- a/drivers/media/i2c/aptina-pll.c
+++ b/drivers/media/i2c/aptina-pll.c
@@ -2,6 +2,7 @@
  * Aptina Sensor PLL Configuration
  *
  * Copyright (C) 2012 Laurent Pinchart <laurent.pinchart@ideasonboard.com>
+ * Copyright (C) 2018 Intenta GmbH
  *
  * This program is free software; you can redistribute it and/or
  * modify it under the terms of the GNU General Public License
@@ -14,23 +15,84 @@
  */
 
 #include <linux/device.h>
-#include <linux/gcd.h>
 #include <linux/kernel.h>
-#include <linux/lcm.h>
+#include <linux/math64.h>
 #include <linux/module.h>
 
 #include "aptina-pll.h"
 
+/* A variant of mult_frac that works even when (x % denom) * numer produces a
+ * 32bit overflow.
+ */
+#define mult_frac_exact(x, numer, denom) \
+	((u32)div_u64(mul_u32_u32((x), (numer)), (denom)))
+
+static unsigned int absdiff(unsigned int x, unsigned int y)
+{
+	return x > y ? x - y : y - x;
+}
+
+/* Find n_min <= *np <= n_max and d_min <= *dp <= d_max such that *np / *dp
+ * approximates n_target / d_target.
+ */
+static int approximate_fraction(unsigned int n_min, unsigned int n_max,
+				unsigned int d_min, unsigned int d_max,
+				unsigned int n_target, unsigned int d_target,
+				unsigned int *np, unsigned int *dp)
+{
+	unsigned int d, best_err = UINT_MAX;
+
+	d_min = max(d_min, mult_frac_exact(n_min, d_target, n_target));
+	d_max = min(d_max, mult_frac_exact(n_max, d_target, n_target) + 1);
+	if (d_min > d_max)
+		return -EINVAL;
+
+	for (d = d_min; d < d_max; ++d) {
+		unsigned int n = mult_frac_exact(d, n_target, d_target);
+
+		if (n >= n_min) {
+			unsigned int err = absdiff(
+				n_target, mult_frac_exact(n, d_target, d));
+
+			if (err < best_err) {
+				best_err = err;
+				*np = n;
+				*dp = d;
+			}
+			if (err == 0)
+				return 0;
+		}
+		++n;
+		if (n <= n_max) {
+			unsigned int err = absdiff(
+				n_target, mult_frac_exact(n, d_target, d));
+
+			if (err < best_err) {
+				best_err = err;
+				*np = n;
+				*dp = d;
+			}
+		}
+	}
+	return best_err == UINT_MAX ? -EINVAL : 0;
+}
+
+/* Find parameters n, m, p1 such that:
+ *   n_min <= n <= n_max
+ *   m_min <= m <= m_max
+ *   p1_min <= p1 <= p1_max, even
+ *   int_clock_min <= ext_clock / n <= int_clock_max
+ *   out_clock_min <= ext_clock / n * m <= out_clock_max
+ *   pix_clock = ext_clock / n * m / p1 (approximately)
+ *   p1 is maximized
+ * Report the achieved pix_clock (input/output parameter).
+ */
 int aptina_pll_calculate(struct device *dev,
 			 const struct aptina_pll_limits *limits,
 			 struct aptina_pll *pll)
 {
-	unsigned int mf_min;
-	unsigned int mf_max;
-	unsigned int p1_min;
-	unsigned int p1_max;
-	unsigned int p1;
-	unsigned int div;
+	unsigned int n_min, n_max, m_min, m_max, p1_min, p1_max, p1;
+	unsigned int clock_err = UINT_MAX;
 
 	dev_dbg(dev, "PLL: ext clock %u pix clock %u\n",
 		pll->ext_clock, pll->pix_clock);
@@ -46,120 +108,103 @@ int aptina_pll_calculate(struct device *dev,
 		return -EINVAL;
 	}
 
-	/* Compute the multiplier M and combined N*P1 divisor. */
-	div = gcd(pll->pix_clock, pll->ext_clock);
-	pll->m = pll->pix_clock / div;
-	div = pll->ext_clock / div;
-
-	/* We now have the smallest M and N*P1 values that will result in the
-	 * desired pixel clock frequency, but they might be out of the valid
-	 * range. Compute the factor by which we should multiply them given the
-	 * following constraints:
-	 *
-	 * - minimum/maximum multiplier
-	 * - minimum/maximum multiplier output clock frequency assuming the
-	 *   minimum/maximum N value
-	 * - minimum/maximum combined N*P1 divisor
-	 */
-	mf_min = DIV_ROUND_UP(limits->m_min, pll->m);
-	mf_min = max(mf_min, limits->out_clock_min /
-		     (pll->ext_clock / limits->n_min * pll->m));
-	mf_min = max(mf_min, limits->n_min * limits->p1_min / div);
-	mf_max = limits->m_max / pll->m;
-	mf_max = min(mf_max, limits->out_clock_max /
-		    (pll->ext_clock / limits->n_max * pll->m));
-	mf_max = min(mf_max, DIV_ROUND_UP(limits->n_max * limits->p1_max, div));
-
-	dev_dbg(dev, "pll: mf min %u max %u\n", mf_min, mf_max);
-	if (mf_min > mf_max) {
-		dev_err(dev, "pll: no valid combined N*P1 divisor.\n");
+	/* int_clock_min <= ext_clock / N <= int_clock_max */
+	n_min = max(limits->n_min,
+		    DIV_ROUND_UP(pll->ext_clock, limits->int_clock_max));
+	n_max = min(limits->n_max,
+		    pll->ext_clock / limits->int_clock_min);
+
+	if (n_min > n_max) {
+		dev_err(dev,
+			"pll: no divisor N results in a valid int_clock.\n");
+		return -EINVAL;
+	}
+
+	/* out_clock_min <= ext_clock / N * M <= out_clock_max */
+	m_min = max(limits->m_min,
+		    mult_frac(limits->out_clock_min, n_min, pll->ext_clock));
+	m_max = min(limits->m_max,
+		    mult_frac(limits->out_clock_max, n_max, pll->ext_clock));
+	if (m_min > m_max) {
+		dev_err(dev,
+			"pll: no multiplier M results in a valid out_clock.\n");
+		return -EINVAL;
+	}
+
+	/* Using limits of m, we can further shrink the range of n. */
+	n_min = max(n_min,
+		    mult_frac(pll->ext_clock, m_min, limits->out_clock_max));
+	n_max = max(n_max,
+		    mult_frac(pll->ext_clock, m_max, limits->out_clock_min));
+	if (n_min > n_max) {
+		dev_err(dev,
+			"pll: no divisor N results in a valid out_clock.\n");
 		return -EINVAL;
 	}
 
-	/*
-	 * We're looking for the highest acceptable P1 value for which a
-	 * multiplier factor MF exists that fulfills the following conditions:
-	 *
-	 * 1. p1 is in the [p1_min, p1_max] range given by the limits and is
-	 *    even
-	 * 2. mf is in the [mf_min, mf_max] range computed above
-	 * 3. div * mf is a multiple of p1, in order to compute
-	 *	n = div * mf / p1
-	 *	m = pll->m * mf
-	 * 4. the internal clock frequency, given by ext_clock / n, is in the
-	 *    [int_clock_min, int_clock_max] range given by the limits
-	 * 5. the output clock frequency, given by ext_clock / n * m, is in the
-	 *    [out_clock_min, out_clock_max] range given by the limits
-	 *
-	 * The first naive approach is to iterate over all p1 values acceptable
-	 * according to (1) and all mf values acceptable according to (2), and
-	 * stop at the first combination that fulfills (3), (4) and (5). This
-	 * has a O(n^2) complexity.
-	 *
-	 * Instead of iterating over all mf values in the [mf_min, mf_max] range
-	 * we can compute the mf increment between two acceptable values
-	 * according to (3) with
-	 *
-	 *	mf_inc = p1 / gcd(div, p1)			(6)
-	 *
-	 * and round the minimum up to the nearest multiple of mf_inc. This will
-	 * restrict the number of mf values to be checked.
-	 *
-	 * Furthermore, conditions (4) and (5) only restrict the range of
-	 * acceptable p1 and mf values by modifying the minimum and maximum
-	 * limits. (5) can be expressed as
-	 *
-	 *	ext_clock / (div * mf / p1) * m * mf >= out_clock_min
-	 *	ext_clock / (div * mf / p1) * m * mf <= out_clock_max
-	 *
-	 * or
-	 *
-	 *	p1 >= out_clock_min * div / (ext_clock * m)	(7)
-	 *	p1 <= out_clock_max * div / (ext_clock * m)
-	 *
-	 * Similarly, (4) can be expressed as
-	 *
-	 *	mf >= ext_clock * p1 / (int_clock_max * div)	(8)
-	 *	mf <= ext_clock * p1 / (int_clock_min * div)
-	 *
-	 * We can thus iterate over the restricted p1 range defined by the
-	 * combination of (1) and (7), and then compute the restricted mf range
-	 * defined by the combination of (2), (6) and (8). If the resulting mf
-	 * range is not empty, any value in the mf range is acceptable. We thus
-	 * select the mf lwoer bound and the corresponding p1 value.
-	 */
-	if (limits->p1_min == 0) {
-		dev_err(dev, "pll: P1 minimum value must be >0.\n");
+	dev_dbg(dev, "pll: %u <= N <= %u\n", n_min, n_max);
+	dev_dbg(dev, "pll: %u <= M <= %u\n", m_min, m_max);
+
+	/* out_clock_min <= pix_clock * P1 <= out_clock_max */
+	p1_min = max(limits->p1_min,
+		     DIV_ROUND_UP(limits->out_clock_min, pll->pix_clock));
+	p1_max = min(limits->p1_max,
+		     limits->out_clock_max / pll->pix_clock);
+	/* pix_clock = ext_clock / N * M / P1 */
+	p1_min = max(p1_min, DIV_ROUND_UP(pll->ext_clock * m_min,
+					  pll->pix_clock * n_max));
+	p1_max = min(p1_max,
+		     pll->ext_clock * m_max / (pll->pix_clock * n_min));
+	if (p1_min > p1_max) {
+		dev_err(dev, "pll: no valid P1 divisor.\n");
 		return -EINVAL;
 	}
 
-	p1_min = max(limits->p1_min, DIV_ROUND_UP(limits->out_clock_min * div,
-		     pll->ext_clock * pll->m));
-	p1_max = min(limits->p1_max, limits->out_clock_max * div /
-		     (pll->ext_clock * pll->m));
+	dev_dbg(dev, "pll: %u <= P1 <= %u\n", p1_min, p1_max);
 
 	for (p1 = p1_max & ~1; p1 >= p1_min; p1 -= 2) {
-		unsigned int mf_inc = p1 / gcd(div, p1);
-		unsigned int mf_high;
-		unsigned int mf_low;
+		unsigned int n = 0, m = UINT_MAX, out_clock, err;
+		const unsigned int target_out_clock = pll->pix_clock * p1;
 
-		mf_low = roundup(max(mf_min, DIV_ROUND_UP(pll->ext_clock * p1,
-					limits->int_clock_max * div)), mf_inc);
-		mf_high = min(mf_max, pll->ext_clock * p1 /
-			      (limits->int_clock_min * div));
+		/* target_out_clock = ext_clock / N * M */
+		if (approximate_fraction(n_min, n_max, m_min, m_max,
+					 pll->ext_clock, target_out_clock,
+					 &n, &m) < 0)
+			continue;
 
-		if (mf_low > mf_high)
+		/* We must check all conditions due to possible rounding
+		 * errors:
+		 *   int_clock_min <= ext_clock / N <= int_clock_max
+		 *   out_clock_min <= ext_clock / N * M <= out_clock_max
+		 */
+		out_clock = mult_frac(pll->ext_clock, m, n);
+		if (pll->ext_clock < limits->int_clock_min * n ||
+		    pll->ext_clock > limits->int_clock_max * n ||
+		    out_clock < limits->out_clock_min ||
+		    out_clock > limits->out_clock_max)
 			continue;
 
-		pll->n = div * mf_low / p1;
-		pll->m *= mf_low;
-		pll->p1 = p1;
-		dev_dbg(dev, "PLL: N %u M %u P1 %u\n", pll->n, pll->m, pll->p1);
-		return 0;
+		err = absdiff(out_clock / p1, pll->pix_clock);
+		if (err < clock_err) {
+			pll->n = n;
+			pll->m = m;
+			pll->p1 = p1;
+			clock_err = err;
+		}
+		if (err == 0) {
+			dev_dbg(dev, "pll: N %u M %u P1 %u exact\n",
+				pll->n, pll->m, pll->p1);
+			return 0;
+		}
 	}
-
-	dev_err(dev, "pll: no valid N and P1 divisors found.\n");
-	return -EINVAL;
+	if (clock_err == UINT_MAX) {
+		dev_err(dev, "pll: no valid parameters found.\n");
+		return -EINVAL;
+	}
+	pll->pix_clock = mult_frac(pll->ext_clock, pll->m, pll->n * pll->p1);
+	dev_dbg(dev, "pll: N %u M %u P1 %u pix_clock %u Hz error %u Hz\n",
+		pll->n, pll->m, pll->p1, pll->pix_clock, clock_err);
+	return 0;
 }
 EXPORT_SYMBOL_GPL(aptina_pll_calculate);
 
diff --git a/drivers/media/i2c/mt9m032.c b/drivers/media/i2c/mt9m032.c
index 6a9e068462fd..31410f40b197 100644
--- a/drivers/media/i2c/mt9m032.c
+++ b/drivers/media/i2c/mt9m032.c
@@ -285,7 +285,7 @@ static int mt9m032_setup_pll(struct mt9m032 *sensor)
 	if (ret < 0)
 		return ret;
 
-	sensor->pix_clock = pdata->pix_clock;
+	sensor->pix_clock = pll.pix_clock;
 
 	ret = mt9m032_write(client, MT9M032_PLL_CONFIG1,
 			    (pll.m << MT9M032_PLL_CONFIG1_MUL_SHIFT) |
@@ -711,6 +711,7 @@ static int mt9m032_probe(struct i2c_client *client,
 	struct mt9m032_platform_data *pdata = client->dev.platform_data;
 	struct i2c_adapter *adapter = client->adapter;
 	struct mt9m032 *sensor;
+	struct v4l2_ctrl *pixel_rate_ctrl;
 	int chip_version;
 	int ret;
 
@@ -780,9 +781,10 @@ static int mt9m032_probe(struct i2c_client *client,
 			  V4L2_CID_EXPOSURE, MT9M032_SHUTTER_WIDTH_MIN,
 			  MT9M032_SHUTTER_WIDTH_MAX, 1,
 			  MT9M032_SHUTTER_WIDTH_DEF);
-	v4l2_ctrl_new_std(&sensor->ctrls, &mt9m032_ctrl_ops,
-			  V4L2_CID_PIXEL_RATE, pdata->pix_clock,
-			  pdata->pix_clock, 1, pdata->pix_clock);
+	pixel_rate_ctrl = v4l2_ctrl_new_std(&sensor->ctrls, &mt9m032_ctrl_ops,
+					    V4L2_CID_PIXEL_RATE,
+					    pdata->pix_clock, pdata->pix_clock,
+					    1, pdata->pix_clock);
 
 	if (sensor->ctrls.error) {
 		ret = sensor->ctrls.error;
@@ -810,6 +812,11 @@ static int mt9m032_probe(struct i2c_client *client,
 		goto error_entity;
 	usleep_range(10000, 11000);
 
+	ret = __v4l2_ctrl_modify_range(pixel_rate_ctrl, sensor->pix_clock,
+				       sensor->pix_clock, 1, sensor->pix_clock);
+	if (ret < 0)
+		goto error_entity;
+
 	ret = v4l2_ctrl_handler_setup(&sensor->ctrls);
 	if (ret < 0)
 		goto error_entity;
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] media: aptina-pll: allow approximating the requested pix_clock
  2018-08-23  7:52     ` [PATCH] media: aptina-pll: allow approximating the requested pix_clock Helmut Grohne
@ 2018-08-23 11:12       ` Laurent Pinchart
  2018-08-24 12:05         ` Helmut Grohne
  2018-08-23 11:30       ` Sakari Ailus
  1 sibling, 1 reply; 11+ messages in thread
From: Laurent Pinchart @ 2018-08-23 11:12 UTC (permalink / raw)
  To: Helmut Grohne; +Cc: Mauro Carvalho Chehab, linux-media, Sakari Ailus

Hi Helmut,

Thank you for the patch.

On Thursday, 23 August 2018 10:52:09 EEST Helmut Grohne wrote:
> Clock frequencies are not exact values, but rather imprecise, physical
> properties. The present pll computation however, treats them as exact.
> It tries to compute parameters that attain the requested pix_clock
> exactly. Failing that, it gives up.
> 
> The new implementation approximates the requested pix_clock and reports
> the actual value resulting from the computed parameters. If the
> requested pix_clock can be attained, the original behaviour of
> maximizing p1 is retained.
> 
> Experiments with the algorithm in userspace on an arm device show that
> the computational cost is negligible compared to executing a binary for
> all practical inputs.

Could you please share numbers, ideally when run in kernel space ?

> Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
> ---
>  drivers/media/i2c/aptina-pll.c | 263 ++++++++++++++++++++++---------------
>  drivers/media/i2c/mt9m032.c    |  15 ++-
>  2 files changed, 165 insertions(+), 113 deletions(-)
> 
> I tried using aptina_pll_calculate in a driver for a similar image sensor.
> Using it, I was unable to find a pix_clock value such that the computation
> was successful. Most of the practical parameters result in a non-integer
> pix_clock, something that struct pll cannot represent.
> 
> The driver is not ready for submission at this point, but the changes to
> aptina_pll_calculate are reasonable on their own, which is why I submit them
> separately now.

This patch is very hard to review as you rewrite the whole, removing all the 
documentation for the existing algorithm, without documenting the new one. 
There's also no example of a failing case with the existing code that works 
with yours.

Could you please document the algorithm in details (especially explaining the 
background ideas), and share at least one use case, with with numbers for all 
the input and output parameters ?

> diff --git a/drivers/media/i2c/aptina-pll.c b/drivers/media/i2c/aptina-pll.c
> index 224ae4e4cf8b..249018772b2b 100644
> --- a/drivers/media/i2c/aptina-pll.c
> +++ b/drivers/media/i2c/aptina-pll.c
> @@ -2,6 +2,7 @@
>   * Aptina Sensor PLL Configuration
>   *
>   * Copyright (C) 2012 Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> + * Copyright (C) 2018 Intenta GmbH
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> @@ -14,23 +15,84 @@
>   */
> 
>  #include <linux/device.h>
> -#include <linux/gcd.h>
>  #include <linux/kernel.h>
> -#include <linux/lcm.h>
> +#include <linux/math64.h>
>  #include <linux/module.h>
> 
>  #include "aptina-pll.h"
> 
> +/* A variant of mult_frac that works even when (x % denom) * numer produces
> a + * 32bit overflow.
> + */
> +#define mult_frac_exact(x, numer, denom) \
> +	((u32)div_u64(mul_u32_u32((x), (numer)), (denom)))
> +
> +static unsigned int absdiff(unsigned int x, unsigned int y)
> +{
> +	return x > y ? x - y : y - x;
> +}
> +
> +/* Find n_min <= *np <= n_max and d_min <= *dp <= d_max such that *np / *dp
> + * approximates n_target / d_target.
> + */
> +static int approximate_fraction(unsigned int n_min, unsigned int n_max,
> +				unsigned int d_min, unsigned int d_max,
> +				unsigned int n_target, unsigned int d_target,
> +				unsigned int *np, unsigned int *dp)
> +{
> +	unsigned int d, best_err = UINT_MAX;
> +
> +	d_min = max(d_min, mult_frac_exact(n_min, d_target, n_target));
> +	d_max = min(d_max, mult_frac_exact(n_max, d_target, n_target) + 1);
> +	if (d_min > d_max)
> +		return -EINVAL;
> +
> +	for (d = d_min; d < d_max; ++d) {
> +		unsigned int n = mult_frac_exact(d, n_target, d_target);
> +
> +		if (n >= n_min) {
> +			unsigned int err = absdiff(
> +				n_target, mult_frac_exact(n, d_target, d));
> +
> +			if (err < best_err) {
> +				best_err = err;
> +				*np = n;
> +				*dp = d;
> +			}
> +			if (err == 0)
> +				return 0;
> +		}
> +		++n;
> +		if (n <= n_max) {
> +			unsigned int err = absdiff(
> +				n_target, mult_frac_exact(n, d_target, d));
> +
> +			if (err < best_err) {
> +				best_err = err;
> +				*np = n;
> +				*dp = d;
> +			}
> +		}
> +	}
> +	return best_err == UINT_MAX ? -EINVAL : 0;
> +}
> +
> +/* Find parameters n, m, p1 such that:
> + *   n_min <= n <= n_max
> + *   m_min <= m <= m_max
> + *   p1_min <= p1 <= p1_max, even
> + *   int_clock_min <= ext_clock / n <= int_clock_max
> + *   out_clock_min <= ext_clock / n * m <= out_clock_max
> + *   pix_clock = ext_clock / n * m / p1 (approximately)
> + *   p1 is maximized
> + * Report the achieved pix_clock (input/output parameter).
> + */
>  int aptina_pll_calculate(struct device *dev,
>  			 const struct aptina_pll_limits *limits,
>  			 struct aptina_pll *pll)
>  {
> -	unsigned int mf_min;
> -	unsigned int mf_max;
> -	unsigned int p1_min;
> -	unsigned int p1_max;
> -	unsigned int p1;
> -	unsigned int div;
> +	unsigned int n_min, n_max, m_min, m_max, p1_min, p1_max, p1;
> +	unsigned int clock_err = UINT_MAX;
> 
>  	dev_dbg(dev, "PLL: ext clock %u pix clock %u\n",
>  		pll->ext_clock, pll->pix_clock);
> @@ -46,120 +108,103 @@ int aptina_pll_calculate(struct device *dev,
>  		return -EINVAL;
>  	}
> 
> -	/* Compute the multiplier M and combined N*P1 divisor. */
> -	div = gcd(pll->pix_clock, pll->ext_clock);
> -	pll->m = pll->pix_clock / div;
> -	div = pll->ext_clock / div;
> -
> -	/* We now have the smallest M and N*P1 values that will result in the
> -	 * desired pixel clock frequency, but they might be out of the valid
> -	 * range. Compute the factor by which we should multiply them given the
> -	 * following constraints:
> -	 *
> -	 * - minimum/maximum multiplier
> -	 * - minimum/maximum multiplier output clock frequency assuming the
> -	 *   minimum/maximum N value
> -	 * - minimum/maximum combined N*P1 divisor
> -	 */
> -	mf_min = DIV_ROUND_UP(limits->m_min, pll->m);
> -	mf_min = max(mf_min, limits->out_clock_min /
> -		     (pll->ext_clock / limits->n_min * pll->m));
> -	mf_min = max(mf_min, limits->n_min * limits->p1_min / div);
> -	mf_max = limits->m_max / pll->m;
> -	mf_max = min(mf_max, limits->out_clock_max /
> -		    (pll->ext_clock / limits->n_max * pll->m));
> -	mf_max = min(mf_max, DIV_ROUND_UP(limits->n_max * limits->p1_max, div));
> -
> -	dev_dbg(dev, "pll: mf min %u max %u\n", mf_min, mf_max);
> -	if (mf_min > mf_max) {
> -		dev_err(dev, "pll: no valid combined N*P1 divisor.\n");
> +	/* int_clock_min <= ext_clock / N <= int_clock_max */
> +	n_min = max(limits->n_min,
> +		    DIV_ROUND_UP(pll->ext_clock, limits->int_clock_max));
> +	n_max = min(limits->n_max,
> +		    pll->ext_clock / limits->int_clock_min);
> +
> +	if (n_min > n_max) {
> +		dev_err(dev,
> +			"pll: no divisor N results in a valid int_clock.\n");
> +		return -EINVAL;
> +	}
> +
> +	/* out_clock_min <= ext_clock / N * M <= out_clock_max */
> +	m_min = max(limits->m_min,
> +		    mult_frac(limits->out_clock_min, n_min, pll->ext_clock));
> +	m_max = min(limits->m_max,
> +		    mult_frac(limits->out_clock_max, n_max, pll->ext_clock));
> +	if (m_min > m_max) {
> +		dev_err(dev,
> +			"pll: no multiplier M results in a valid out_clock.\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Using limits of m, we can further shrink the range of n. */
> +	n_min = max(n_min,
> +		    mult_frac(pll->ext_clock, m_min, limits->out_clock_max));
> +	n_max = max(n_max,
> +		    mult_frac(pll->ext_clock, m_max, limits->out_clock_min));
> +	if (n_min > n_max) {
> +		dev_err(dev,
> +			"pll: no divisor N results in a valid out_clock.\n");
>  		return -EINVAL;
>  	}
> 
> -	/*
> -	 * We're looking for the highest acceptable P1 value for which a
> -	 * multiplier factor MF exists that fulfills the following conditions:
> -	 *
> -	 * 1. p1 is in the [p1_min, p1_max] range given by the limits and is
> -	 *    even
> -	 * 2. mf is in the [mf_min, mf_max] range computed above
> -	 * 3. div * mf is a multiple of p1, in order to compute
> -	 *	n = div * mf / p1
> -	 *	m = pll->m * mf
> -	 * 4. the internal clock frequency, given by ext_clock / n, is in the
> -	 *    [int_clock_min, int_clock_max] range given by the limits
> -	 * 5. the output clock frequency, given by ext_clock / n * m, is in the
> -	 *    [out_clock_min, out_clock_max] range given by the limits
> -	 *
> -	 * The first naive approach is to iterate over all p1 values acceptable
> -	 * according to (1) and all mf values acceptable according to (2), and
> -	 * stop at the first combination that fulfills (3), (4) and (5). This
> -	 * has a O(n^2) complexity.
> -	 *
> -	 * Instead of iterating over all mf values in the [mf_min, mf_max] range
> -	 * we can compute the mf increment between two acceptable values
> -	 * according to (3) with
> -	 *
> -	 *	mf_inc = p1 / gcd(div, p1)			(6)
> -	 *
> -	 * and round the minimum up to the nearest multiple of mf_inc. This will
> -	 * restrict the number of mf values to be checked.
> -	 *
> -	 * Furthermore, conditions (4) and (5) only restrict the range of
> -	 * acceptable p1 and mf values by modifying the minimum and maximum
> -	 * limits. (5) can be expressed as
> -	 *
> -	 *	ext_clock / (div * mf / p1) * m * mf >= out_clock_min
> -	 *	ext_clock / (div * mf / p1) * m * mf <= out_clock_max
> -	 *
> -	 * or
> -	 *
> -	 *	p1 >= out_clock_min * div / (ext_clock * m)	(7)
> -	 *	p1 <= out_clock_max * div / (ext_clock * m)
> -	 *
> -	 * Similarly, (4) can be expressed as
> -	 *
> -	 *	mf >= ext_clock * p1 / (int_clock_max * div)	(8)
> -	 *	mf <= ext_clock * p1 / (int_clock_min * div)
> -	 *
> -	 * We can thus iterate over the restricted p1 range defined by the
> -	 * combination of (1) and (7), and then compute the restricted mf range
> -	 * defined by the combination of (2), (6) and (8). If the resulting mf
> -	 * range is not empty, any value in the mf range is acceptable. We thus
> -	 * select the mf lwoer bound and the corresponding p1 value.
> -	 */
> -	if (limits->p1_min == 0) {
> -		dev_err(dev, "pll: P1 minimum value must be >0.\n");
> +	dev_dbg(dev, "pll: %u <= N <= %u\n", n_min, n_max);
> +	dev_dbg(dev, "pll: %u <= M <= %u\n", m_min, m_max);
> +
> +	/* out_clock_min <= pix_clock * P1 <= out_clock_max */
> +	p1_min = max(limits->p1_min,
> +		     DIV_ROUND_UP(limits->out_clock_min, pll->pix_clock));
> +	p1_max = min(limits->p1_max,
> +		     limits->out_clock_max / pll->pix_clock);
> +	/* pix_clock = ext_clock / N * M / P1 */
> +	p1_min = max(p1_min, DIV_ROUND_UP(pll->ext_clock * m_min,
> +					  pll->pix_clock * n_max));
> +	p1_max = min(p1_max,
> +		     pll->ext_clock * m_max / (pll->pix_clock * n_min));
> +	if (p1_min > p1_max) {
> +		dev_err(dev, "pll: no valid P1 divisor.\n");
>  		return -EINVAL;
>  	}
> 
> -	p1_min = max(limits->p1_min, DIV_ROUND_UP(limits->out_clock_min * div,
> -		     pll->ext_clock * pll->m));
> -	p1_max = min(limits->p1_max, limits->out_clock_max * div /
> -		     (pll->ext_clock * pll->m));
> +	dev_dbg(dev, "pll: %u <= P1 <= %u\n", p1_min, p1_max);
> 
>  	for (p1 = p1_max & ~1; p1 >= p1_min; p1 -= 2) {
> -		unsigned int mf_inc = p1 / gcd(div, p1);
> -		unsigned int mf_high;
> -		unsigned int mf_low;
> +		unsigned int n = 0, m = UINT_MAX, out_clock, err;
> +		const unsigned int target_out_clock = pll->pix_clock * p1;
> 
> -		mf_low = roundup(max(mf_min, DIV_ROUND_UP(pll->ext_clock * p1,
> -					limits->int_clock_max * div)), mf_inc);
> -		mf_high = min(mf_max, pll->ext_clock * p1 /
> -			      (limits->int_clock_min * div));
> +		/* target_out_clock = ext_clock / N * M */
> +		if (approximate_fraction(n_min, n_max, m_min, m_max,
> +					 pll->ext_clock, target_out_clock,
> +					 &n, &m) < 0)
> +			continue;
> 
> -		if (mf_low > mf_high)
> +		/* We must check all conditions due to possible rounding
> +		 * errors:
> +		 *   int_clock_min <= ext_clock / N <= int_clock_max
> +		 *   out_clock_min <= ext_clock / N * M <= out_clock_max
> +		 */
> +		out_clock = mult_frac(pll->ext_clock, m, n);
> +		if (pll->ext_clock < limits->int_clock_min * n ||
> +		    pll->ext_clock > limits->int_clock_max * n ||
> +		    out_clock < limits->out_clock_min ||
> +		    out_clock > limits->out_clock_max)
>  			continue;
> 
> -		pll->n = div * mf_low / p1;
> -		pll->m *= mf_low;
> -		pll->p1 = p1;
> -		dev_dbg(dev, "PLL: N %u M %u P1 %u\n", pll->n, pll->m, pll->p1);
> -		return 0;
> +		err = absdiff(out_clock / p1, pll->pix_clock);
> +		if (err < clock_err) {
> +			pll->n = n;
> +			pll->m = m;
> +			pll->p1 = p1;
> +			clock_err = err;
> +		}
> +		if (err == 0) {
> +			dev_dbg(dev, "pll: N %u M %u P1 %u exact\n",
> +				pll->n, pll->m, pll->p1);
> +			return 0;
> +		}
>  	}
> -
> -	dev_err(dev, "pll: no valid N and P1 divisors found.\n");
> -	return -EINVAL;
> +	if (clock_err == UINT_MAX) {
> +		dev_err(dev, "pll: no valid parameters found.\n");
> +		return -EINVAL;
> +	}
> +	pll->pix_clock = mult_frac(pll->ext_clock, pll->m, pll->n * pll->p1);
> +	dev_dbg(dev, "pll: N %u M %u P1 %u pix_clock %u Hz error %u Hz\n",
> +		pll->n, pll->m, pll->p1, pll->pix_clock, clock_err);
> +	return 0;
>  }
>  EXPORT_SYMBOL_GPL(aptina_pll_calculate);
> 
> diff --git a/drivers/media/i2c/mt9m032.c b/drivers/media/i2c/mt9m032.c
> index 6a9e068462fd..31410f40b197 100644
> --- a/drivers/media/i2c/mt9m032.c
> +++ b/drivers/media/i2c/mt9m032.c
> @@ -285,7 +285,7 @@ static int mt9m032_setup_pll(struct mt9m032 *sensor)
>  	if (ret < 0)
>  		return ret;
> 
> -	sensor->pix_clock = pdata->pix_clock;
> +	sensor->pix_clock = pll.pix_clock;
> 
>  	ret = mt9m032_write(client, MT9M032_PLL_CONFIG1,
>  			    (pll.m << MT9M032_PLL_CONFIG1_MUL_SHIFT) |
> @@ -711,6 +711,7 @@ static int mt9m032_probe(struct i2c_client *client,
>  	struct mt9m032_platform_data *pdata = client->dev.platform_data;
>  	struct i2c_adapter *adapter = client->adapter;
>  	struct mt9m032 *sensor;
> +	struct v4l2_ctrl *pixel_rate_ctrl;
>  	int chip_version;
>  	int ret;
> 
> @@ -780,9 +781,10 @@ static int mt9m032_probe(struct i2c_client *client,
>  			  V4L2_CID_EXPOSURE, MT9M032_SHUTTER_WIDTH_MIN,
>  			  MT9M032_SHUTTER_WIDTH_MAX, 1,
>  			  MT9M032_SHUTTER_WIDTH_DEF);
> -	v4l2_ctrl_new_std(&sensor->ctrls, &mt9m032_ctrl_ops,
> -			  V4L2_CID_PIXEL_RATE, pdata->pix_clock,
> -			  pdata->pix_clock, 1, pdata->pix_clock);
> +	pixel_rate_ctrl = v4l2_ctrl_new_std(&sensor->ctrls, &mt9m032_ctrl_ops,
> +					    V4L2_CID_PIXEL_RATE,
> +					    pdata->pix_clock, pdata->pix_clock,
> +					    1, pdata->pix_clock);
> 
>  	if (sensor->ctrls.error) {
>  		ret = sensor->ctrls.error;
> @@ -810,6 +812,11 @@ static int mt9m032_probe(struct i2c_client *client,
>  		goto error_entity;
>  	usleep_range(10000, 11000);
> 
> +	ret = __v4l2_ctrl_modify_range(pixel_rate_ctrl, sensor->pix_clock,
> +				       sensor->pix_clock, 1, sensor->pix_clock);
> +	if (ret < 0)
> +		goto error_entity;
> +
>  	ret = v4l2_ctrl_handler_setup(&sensor->ctrls);
>  	if (ret < 0)
>  		goto error_entity;


-- 
Regards,

Laurent Pinchart

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] media: aptina-pll: allow approximating the requested pix_clock
  2018-08-23  7:52     ` [PATCH] media: aptina-pll: allow approximating the requested pix_clock Helmut Grohne
  2018-08-23 11:12       ` Laurent Pinchart
@ 2018-08-23 11:30       ` Sakari Ailus
  2018-08-24 12:05         ` Helmut Grohne
  1 sibling, 1 reply; 11+ messages in thread
From: Sakari Ailus @ 2018-08-23 11:30 UTC (permalink / raw)
  To: Helmut Grohne; +Cc: Laurent Pinchart, Mauro Carvalho Chehab, linux-media

Hi Helmut,

On Thu, Aug 23, 2018 at 09:52:09AM +0200, Helmut Grohne wrote:
> Clock frequencies are not exact values, but rather imprecise, physical
> properties. The present pll computation however, treats them as exact.
> It tries to compute parameters that attain the requested pix_clock
> exactly. Failing that, it gives up.
> 
> The new implementation approximates the requested pix_clock and reports
> the actual value resulting from the computed parameters. If the
> requested pix_clock can be attained, the original behaviour of
> maximizing p1 is retained.
> 
> Experiments with the algorithm in userspace on an arm device show that
> the computational cost is negligible compared to executing a binary for
> all practical inputs.
> 
> Signed-off-by: Helmut Grohne <helmut.grohne@intenta.de>
> ---
>  drivers/media/i2c/aptina-pll.c | 263 ++++++++++++++++++++++++-----------------
>  drivers/media/i2c/mt9m032.c    |  15 ++-
>  2 files changed, 165 insertions(+), 113 deletions(-)
> 
> I tried using aptina_pll_calculate in a driver for a similar image sensor.
> Using it, I was unable to find a pix_clock value such that the computation was
> successful. Most of the practical parameters result in a non-integer pix_clock,
> something that struct pll cannot represent.

Knowing the formula, the limits as well as the external clock frequency, it
should be relatively straightforward to come up with a functional pixel
clock value. Was there a practical problem in coming up with such a value?

You can't pick a value at random; it needs to be one that actually works.
The purpose of having an exact frequency is that the typical systems where
these devices are being used, as I explained earlier, is that having a
random frequency, albeit with which the sensor could possibly work, the
other devices in the system may not do so.

The importantce of the calculator is more apparent for devices that use
serial busses with a variable number of lanes etc. where the pixel format
affects the pixel rate whereas the link frequency stays unchanged.

> 
> The driver is not ready for submission at this point, but the changes to
> aptina_pll_calculate are reasonable on their own, which is why I submit them
> separately now.
> 
> Helmut
> 
> diff --git a/drivers/media/i2c/aptina-pll.c b/drivers/media/i2c/aptina-pll.c
> index 224ae4e4cf8b..249018772b2b 100644
> --- a/drivers/media/i2c/aptina-pll.c
> +++ b/drivers/media/i2c/aptina-pll.c
> @@ -2,6 +2,7 @@
>   * Aptina Sensor PLL Configuration
>   *
>   * Copyright (C) 2012 Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> + * Copyright (C) 2018 Intenta GmbH
>   *
>   * This program is free software; you can redistribute it and/or
>   * modify it under the terms of the GNU General Public License
> @@ -14,23 +15,84 @@
>   */
>  
>  #include <linux/device.h>
> -#include <linux/gcd.h>
>  #include <linux/kernel.h>
> -#include <linux/lcm.h>
> +#include <linux/math64.h>
>  #include <linux/module.h>
>  
>  #include "aptina-pll.h"
>  
> +/* A variant of mult_frac that works even when (x % denom) * numer produces a
> + * 32bit overflow.
> + */
> +#define mult_frac_exact(x, numer, denom) \
> +	((u32)div_u64(mul_u32_u32((x), (numer)), (denom)))
> +
> +static unsigned int absdiff(unsigned int x, unsigned int y)
> +{
> +	return x > y ? x - y : y - x;
> +}
> +
> +/* Find n_min <= *np <= n_max and d_min <= *dp <= d_max such that *np / *dp
> + * approximates n_target / d_target.
> + */
> +static int approximate_fraction(unsigned int n_min, unsigned int n_max,
> +				unsigned int d_min, unsigned int d_max,
> +				unsigned int n_target, unsigned int d_target,
> +				unsigned int *np, unsigned int *dp)
> +{
> +	unsigned int d, best_err = UINT_MAX;
> +
> +	d_min = max(d_min, mult_frac_exact(n_min, d_target, n_target));
> +	d_max = min(d_max, mult_frac_exact(n_max, d_target, n_target) + 1);
> +	if (d_min > d_max)
> +		return -EINVAL;
> +
> +	for (d = d_min; d < d_max; ++d) {
> +		unsigned int n = mult_frac_exact(d, n_target, d_target);
> +
> +		if (n >= n_min) {
> +			unsigned int err = absdiff(
> +				n_target, mult_frac_exact(n, d_target, d));
> +
> +			if (err < best_err) {
> +				best_err = err;
> +				*np = n;
> +				*dp = d;
> +			}
> +			if (err == 0)
> +				return 0;
> +		}
> +		++n;
> +		if (n <= n_max) {
> +			unsigned int err = absdiff(
> +				n_target, mult_frac_exact(n, d_target, d));
> +
> +			if (err < best_err) {
> +				best_err = err;
> +				*np = n;
> +				*dp = d;
> +			}
> +		}
> +	}
> +	return best_err == UINT_MAX ? -EINVAL : 0;
> +}
> +
> +/* Find parameters n, m, p1 such that:
> + *   n_min <= n <= n_max
> + *   m_min <= m <= m_max
> + *   p1_min <= p1 <= p1_max, even
> + *   int_clock_min <= ext_clock / n <= int_clock_max
> + *   out_clock_min <= ext_clock / n * m <= out_clock_max
> + *   pix_clock = ext_clock / n * m / p1 (approximately)
> + *   p1 is maximized
> + * Report the achieved pix_clock (input/output parameter).
> + */
>  int aptina_pll_calculate(struct device *dev,
>  			 const struct aptina_pll_limits *limits,
>  			 struct aptina_pll *pll)
>  {
> -	unsigned int mf_min;
> -	unsigned int mf_max;
> -	unsigned int p1_min;
> -	unsigned int p1_max;
> -	unsigned int p1;
> -	unsigned int div;
> +	unsigned int n_min, n_max, m_min, m_max, p1_min, p1_max, p1;
> +	unsigned int clock_err = UINT_MAX;
>  
>  	dev_dbg(dev, "PLL: ext clock %u pix clock %u\n",
>  		pll->ext_clock, pll->pix_clock);
> @@ -46,120 +108,103 @@ int aptina_pll_calculate(struct device *dev,
>  		return -EINVAL;
>  	}
>  
> -	/* Compute the multiplier M and combined N*P1 divisor. */
> -	div = gcd(pll->pix_clock, pll->ext_clock);
> -	pll->m = pll->pix_clock / div;
> -	div = pll->ext_clock / div;
> -
> -	/* We now have the smallest M and N*P1 values that will result in the
> -	 * desired pixel clock frequency, but they might be out of the valid
> -	 * range. Compute the factor by which we should multiply them given the
> -	 * following constraints:
> -	 *
> -	 * - minimum/maximum multiplier
> -	 * - minimum/maximum multiplier output clock frequency assuming the
> -	 *   minimum/maximum N value
> -	 * - minimum/maximum combined N*P1 divisor
> -	 */
> -	mf_min = DIV_ROUND_UP(limits->m_min, pll->m);
> -	mf_min = max(mf_min, limits->out_clock_min /
> -		     (pll->ext_clock / limits->n_min * pll->m));
> -	mf_min = max(mf_min, limits->n_min * limits->p1_min / div);
> -	mf_max = limits->m_max / pll->m;
> -	mf_max = min(mf_max, limits->out_clock_max /
> -		    (pll->ext_clock / limits->n_max * pll->m));
> -	mf_max = min(mf_max, DIV_ROUND_UP(limits->n_max * limits->p1_max, div));
> -
> -	dev_dbg(dev, "pll: mf min %u max %u\n", mf_min, mf_max);
> -	if (mf_min > mf_max) {
> -		dev_err(dev, "pll: no valid combined N*P1 divisor.\n");
> +	/* int_clock_min <= ext_clock / N <= int_clock_max */
> +	n_min = max(limits->n_min,
> +		    DIV_ROUND_UP(pll->ext_clock, limits->int_clock_max));
> +	n_max = min(limits->n_max,
> +		    pll->ext_clock / limits->int_clock_min);
> +
> +	if (n_min > n_max) {
> +		dev_err(dev,
> +			"pll: no divisor N results in a valid int_clock.\n");
> +		return -EINVAL;
> +	}
> +
> +	/* out_clock_min <= ext_clock / N * M <= out_clock_max */
> +	m_min = max(limits->m_min,
> +		    mult_frac(limits->out_clock_min, n_min, pll->ext_clock));
> +	m_max = min(limits->m_max,
> +		    mult_frac(limits->out_clock_max, n_max, pll->ext_clock));
> +	if (m_min > m_max) {
> +		dev_err(dev,
> +			"pll: no multiplier M results in a valid out_clock.\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Using limits of m, we can further shrink the range of n. */
> +	n_min = max(n_min,
> +		    mult_frac(pll->ext_clock, m_min, limits->out_clock_max));
> +	n_max = max(n_max,
> +		    mult_frac(pll->ext_clock, m_max, limits->out_clock_min));
> +	if (n_min > n_max) {
> +		dev_err(dev,
> +			"pll: no divisor N results in a valid out_clock.\n");
>  		return -EINVAL;
>  	}
>  
> -	/*
> -	 * We're looking for the highest acceptable P1 value for which a
> -	 * multiplier factor MF exists that fulfills the following conditions:
> -	 *
> -	 * 1. p1 is in the [p1_min, p1_max] range given by the limits and is
> -	 *    even
> -	 * 2. mf is in the [mf_min, mf_max] range computed above
> -	 * 3. div * mf is a multiple of p1, in order to compute
> -	 *	n = div * mf / p1
> -	 *	m = pll->m * mf
> -	 * 4. the internal clock frequency, given by ext_clock / n, is in the
> -	 *    [int_clock_min, int_clock_max] range given by the limits
> -	 * 5. the output clock frequency, given by ext_clock / n * m, is in the
> -	 *    [out_clock_min, out_clock_max] range given by the limits
> -	 *
> -	 * The first naive approach is to iterate over all p1 values acceptable
> -	 * according to (1) and all mf values acceptable according to (2), and
> -	 * stop at the first combination that fulfills (3), (4) and (5). This
> -	 * has a O(n^2) complexity.
> -	 *
> -	 * Instead of iterating over all mf values in the [mf_min, mf_max] range
> -	 * we can compute the mf increment between two acceptable values
> -	 * according to (3) with
> -	 *
> -	 *	mf_inc = p1 / gcd(div, p1)			(6)
> -	 *
> -	 * and round the minimum up to the nearest multiple of mf_inc. This will
> -	 * restrict the number of mf values to be checked.
> -	 *
> -	 * Furthermore, conditions (4) and (5) only restrict the range of
> -	 * acceptable p1 and mf values by modifying the minimum and maximum
> -	 * limits. (5) can be expressed as
> -	 *
> -	 *	ext_clock / (div * mf / p1) * m * mf >= out_clock_min
> -	 *	ext_clock / (div * mf / p1) * m * mf <= out_clock_max
> -	 *
> -	 * or
> -	 *
> -	 *	p1 >= out_clock_min * div / (ext_clock * m)	(7)
> -	 *	p1 <= out_clock_max * div / (ext_clock * m)
> -	 *
> -	 * Similarly, (4) can be expressed as
> -	 *
> -	 *	mf >= ext_clock * p1 / (int_clock_max * div)	(8)
> -	 *	mf <= ext_clock * p1 / (int_clock_min * div)
> -	 *
> -	 * We can thus iterate over the restricted p1 range defined by the
> -	 * combination of (1) and (7), and then compute the restricted mf range
> -	 * defined by the combination of (2), (6) and (8). If the resulting mf
> -	 * range is not empty, any value in the mf range is acceptable. We thus
> -	 * select the mf lwoer bound and the corresponding p1 value.
> -	 */
> -	if (limits->p1_min == 0) {
> -		dev_err(dev, "pll: P1 minimum value must be >0.\n");
> +	dev_dbg(dev, "pll: %u <= N <= %u\n", n_min, n_max);
> +	dev_dbg(dev, "pll: %u <= M <= %u\n", m_min, m_max);
> +
> +	/* out_clock_min <= pix_clock * P1 <= out_clock_max */
> +	p1_min = max(limits->p1_min,
> +		     DIV_ROUND_UP(limits->out_clock_min, pll->pix_clock));
> +	p1_max = min(limits->p1_max,
> +		     limits->out_clock_max / pll->pix_clock);
> +	/* pix_clock = ext_clock / N * M / P1 */
> +	p1_min = max(p1_min, DIV_ROUND_UP(pll->ext_clock * m_min,
> +					  pll->pix_clock * n_max));
> +	p1_max = min(p1_max,
> +		     pll->ext_clock * m_max / (pll->pix_clock * n_min));
> +	if (p1_min > p1_max) {
> +		dev_err(dev, "pll: no valid P1 divisor.\n");
>  		return -EINVAL;
>  	}
>  
> -	p1_min = max(limits->p1_min, DIV_ROUND_UP(limits->out_clock_min * div,
> -		     pll->ext_clock * pll->m));
> -	p1_max = min(limits->p1_max, limits->out_clock_max * div /
> -		     (pll->ext_clock * pll->m));
> +	dev_dbg(dev, "pll: %u <= P1 <= %u\n", p1_min, p1_max);
>  
>  	for (p1 = p1_max & ~1; p1 >= p1_min; p1 -= 2) {
> -		unsigned int mf_inc = p1 / gcd(div, p1);
> -		unsigned int mf_high;
> -		unsigned int mf_low;
> +		unsigned int n = 0, m = UINT_MAX, out_clock, err;
> +		const unsigned int target_out_clock = pll->pix_clock * p1;
>  
> -		mf_low = roundup(max(mf_min, DIV_ROUND_UP(pll->ext_clock * p1,
> -					limits->int_clock_max * div)), mf_inc);
> -		mf_high = min(mf_max, pll->ext_clock * p1 /
> -			      (limits->int_clock_min * div));
> +		/* target_out_clock = ext_clock / N * M */
> +		if (approximate_fraction(n_min, n_max, m_min, m_max,
> +					 pll->ext_clock, target_out_clock,
> +					 &n, &m) < 0)
> +			continue;
>  
> -		if (mf_low > mf_high)
> +		/* We must check all conditions due to possible rounding
> +		 * errors:
> +		 *   int_clock_min <= ext_clock / N <= int_clock_max
> +		 *   out_clock_min <= ext_clock / N * M <= out_clock_max
> +		 */
> +		out_clock = mult_frac(pll->ext_clock, m, n);
> +		if (pll->ext_clock < limits->int_clock_min * n ||
> +		    pll->ext_clock > limits->int_clock_max * n ||
> +		    out_clock < limits->out_clock_min ||
> +		    out_clock > limits->out_clock_max)
>  			continue;
>  
> -		pll->n = div * mf_low / p1;
> -		pll->m *= mf_low;
> -		pll->p1 = p1;
> -		dev_dbg(dev, "PLL: N %u M %u P1 %u\n", pll->n, pll->m, pll->p1);
> -		return 0;
> +		err = absdiff(out_clock / p1, pll->pix_clock);
> +		if (err < clock_err) {
> +			pll->n = n;
> +			pll->m = m;
> +			pll->p1 = p1;
> +			clock_err = err;
> +		}
> +		if (err == 0) {
> +			dev_dbg(dev, "pll: N %u M %u P1 %u exact\n",
> +				pll->n, pll->m, pll->p1);
> +			return 0;
> +		}
>  	}
> -
> -	dev_err(dev, "pll: no valid N and P1 divisors found.\n");
> -	return -EINVAL;
> +	if (clock_err == UINT_MAX) {
> +		dev_err(dev, "pll: no valid parameters found.\n");
> +		return -EINVAL;
> +	}
> +	pll->pix_clock = mult_frac(pll->ext_clock, pll->m, pll->n * pll->p1);
> +	dev_dbg(dev, "pll: N %u M %u P1 %u pix_clock %u Hz error %u Hz\n",
> +		pll->n, pll->m, pll->p1, pll->pix_clock, clock_err);
> +	return 0;
>  }
>  EXPORT_SYMBOL_GPL(aptina_pll_calculate);
>  
> diff --git a/drivers/media/i2c/mt9m032.c b/drivers/media/i2c/mt9m032.c
> index 6a9e068462fd..31410f40b197 100644
> --- a/drivers/media/i2c/mt9m032.c
> +++ b/drivers/media/i2c/mt9m032.c
> @@ -285,7 +285,7 @@ static int mt9m032_setup_pll(struct mt9m032 *sensor)
>  	if (ret < 0)
>  		return ret;
>  
> -	sensor->pix_clock = pdata->pix_clock;
> +	sensor->pix_clock = pll.pix_clock;
>  
>  	ret = mt9m032_write(client, MT9M032_PLL_CONFIG1,
>  			    (pll.m << MT9M032_PLL_CONFIG1_MUL_SHIFT) |
> @@ -711,6 +711,7 @@ static int mt9m032_probe(struct i2c_client *client,
>  	struct mt9m032_platform_data *pdata = client->dev.platform_data;
>  	struct i2c_adapter *adapter = client->adapter;
>  	struct mt9m032 *sensor;
> +	struct v4l2_ctrl *pixel_rate_ctrl;
>  	int chip_version;
>  	int ret;
>  
> @@ -780,9 +781,10 @@ static int mt9m032_probe(struct i2c_client *client,
>  			  V4L2_CID_EXPOSURE, MT9M032_SHUTTER_WIDTH_MIN,
>  			  MT9M032_SHUTTER_WIDTH_MAX, 1,
>  			  MT9M032_SHUTTER_WIDTH_DEF);
> -	v4l2_ctrl_new_std(&sensor->ctrls, &mt9m032_ctrl_ops,
> -			  V4L2_CID_PIXEL_RATE, pdata->pix_clock,
> -			  pdata->pix_clock, 1, pdata->pix_clock);
> +	pixel_rate_ctrl = v4l2_ctrl_new_std(&sensor->ctrls, &mt9m032_ctrl_ops,
> +					    V4L2_CID_PIXEL_RATE,
> +					    pdata->pix_clock, pdata->pix_clock,
> +					    1, pdata->pix_clock);
>  
>  	if (sensor->ctrls.error) {
>  		ret = sensor->ctrls.error;
> @@ -810,6 +812,11 @@ static int mt9m032_probe(struct i2c_client *client,
>  		goto error_entity;
>  	usleep_range(10000, 11000);
>  
> +	ret = __v4l2_ctrl_modify_range(pixel_rate_ctrl, sensor->pix_clock,
> +				       sensor->pix_clock, 1, sensor->pix_clock);
> +	if (ret < 0)
> +		goto error_entity;
> +
>  	ret = v4l2_ctrl_handler_setup(&sensor->ctrls);
>  	if (ret < 0)
>  		goto error_entity;
> -- 
> 2.11.0
> 

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] media: aptina-pll: allow approximating the requested pix_clock
  2018-08-23 11:12       ` Laurent Pinchart
@ 2018-08-24 12:05         ` Helmut Grohne
  2018-08-25 11:32           ` Sakari Ailus
  0 siblings, 1 reply; 11+ messages in thread
From: Helmut Grohne @ 2018-08-24 12:05 UTC (permalink / raw)
  To: Laurent Pinchart; +Cc: Mauro Carvalho Chehab, linux-media, Sakari Ailus

Hi Laurent,

Thank you for taking the time to reply to my patch and to my earlier
questions.

On Thu, Aug 23, 2018 at 01:12:15PM +0200, Laurent Pinchart wrote:
> Could you please share numbers, ideally when run in kernel space ?

Can you explain the benefits of profiling this inside the kernel rather
than outside? Doing it outside is much simpler, so I'd like to
understand your reasons. The next question is what kind of system to
profile on. I guess doing it on an X86 will not help. Typical users will
use some arm CPU and integer division on arm is slower than on x86. Is a
Cortex A9 ok?

If you can provide more relevant inputs, that'd also help. I was able to
derive an example input from the dt bindings for mt9p031 (ext=6MHz,
pix=96MHz) and also used random inputs for testing. Getting more
real-world inputs would help in producing a useful benchmark.

> This patch is very hard to review as you rewrite the whole, removing all the 
> documentation for the existing algorithm, without documenting the new one. 

That's not entirely fair. Unlike the original algorithm, I've split it
to multiple functions and unlike the original algorithm, I've documented
pre- and post-conditions. In a sense, that's actually more
documentation. At least, you now see what the functions are supposed to
be doing.

Inside the alogrithm, I've often writting which precondition
(in)equation I used in which particular step.

The variable naming choice makes it very clear that the first step is
reducing the value ranges for n, m, and p1 as much as possible before
proceeding to an actual parameter computation.

There was little choice in removing much of the old algorithm as the
approach of using gcd is numerically unstable. I actually kept
everything until the first gcd computation.

> There's also no example of a failing case with the existing code that works 
> with yours.

That is an unfortunate omission. I should have thought of this and
should have given it upfront. I'm sorry.

Take for instance MT9M024. The data sheet
(http://www.mouser.com/ds/2/308/MT9M024-D-606228.pdf) allows deducing
the following limits:

	const struct aptina_pll_limits mt9m024_limits = {
		.ext_clock_min = 6000000,
		.ext_clock_max = 50000000,
		.int_clock_min = 2000000,
		.int_clock_max = 24000000,
		.out_clock_min = 384000000,
		.out_clock_max = 768000000,
		.pix_clock_max = 74250000,
		.n_min = 1,
		.n_max = 63,
		.m_min = 32,
		.m_max = 255,
		.p1_min = 4,
		.p1_max = 16,
	};

Now if you choose ext_clock and pix_clock maximal within the given
limits, the existing aptina_pll_calculate gives up. Lowering the
pix_clock does not help either. Even down to 73 MHz, it is unable to
find any pll configuration.

The new algorithm finds a solution (n=11, m=98, p1=6) with 7.5 KHz
error. Incidentally, that solution is close to the one given by the
vendor tool (n=22, m=196, p1=6).

> Could you please document the algorithm in details (especially explaining the 
> background ideas), and share at least one use case, with with numbers for all 
> the input and output parameters ?

I'll try to improve the documentation in the next version. Summarizing
the idea is something I can do, but beyond that I don't see much to add
beyond prose.

Ahead of posting a V2, let me suggest this:

/* The first part of the algorithm reduces the given aptina_pll_limits
 * for n, m and p1 using the (in)equalities and the concrete values for
 * ext_clock and pix_clock in order to reduce the search space.
 *
 * The main loop iterates over all remaining possible p1 values and
 * computes the necessary out_clock frequency. The ext_clock / out_clock
 * ratio is then approximated with n / m within their respective bounds.
 * For each parameter choice, the preconditions must be rechecked,
 * because integer rounding errors may result in violating some of the
 * preconditions. The parameter set with the least frequency error is
 * returned.
 */

Is this what you are looking for?

Helmut

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] media: aptina-pll: allow approximating the requested pix_clock
  2018-08-23 11:30       ` Sakari Ailus
@ 2018-08-24 12:05         ` Helmut Grohne
  0 siblings, 0 replies; 11+ messages in thread
From: Helmut Grohne @ 2018-08-24 12:05 UTC (permalink / raw)
  To: Sakari Ailus; +Cc: Laurent Pinchart, Mauro Carvalho Chehab, linux-media

Hi Sakari,

Thank you for taking the time to look into the issue.

On Thu, Aug 23, 2018 at 01:30:12PM +0200, Sakari Ailus wrote:
> Knowing the formula, the limits as well as the external clock frequency, it
> should be relatively straightforward to come up with a functional pixel
> clock value. Was there a practical problem in coming up with such a value?

I've added a concrete example to my other reply and should have done
that with posting the initial question. I'm sorry for not having done it
earlier.

> You can't pick a value at random; it needs to be one that actually works.
> The purpose of having an exact frequency is that the typical systems where
> these devices are being used, as I explained earlier, is that having a
> random frequency, albeit with which the sensor could possibly work, the
> other devices in the system may not do so.

We already refuted the concept of an "exact frequency". In nature, it
simply isn't exact. Every board will have a slightly different
frequency no matter how precise you calculate it.

I'm not after random frequencies in any case. The goal of course is to
approximate the requested frequency as good as possible. In particular,
when the requested pix_clock allows using a parameter set that attains
the frequency exactly, that parameter set will be emitted rather than
some other approximation. Only for parameters where the old algorithm
returns -EINVAL there should be an observable difference.

Using an exact frequency is difficult here. How am I supposed to write
e.g. 74242424.24242424... Hz into the device tree?

Helmut

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] media: aptina-pll: allow approximating the requested pix_clock
  2018-08-24 12:05         ` Helmut Grohne
@ 2018-08-25 11:32           ` Sakari Ailus
  2018-08-27  8:44             ` Helmut Grohne
  0 siblings, 1 reply; 11+ messages in thread
From: Sakari Ailus @ 2018-08-25 11:32 UTC (permalink / raw)
  To: Helmut Grohne; +Cc: Laurent Pinchart, Mauro Carvalho Chehab, linux-media

On Fri, Aug 24, 2018 at 02:05:17PM +0200, Helmut Grohne wrote:
> Hi Laurent,
> 
> Thank you for taking the time to reply to my patch and to my earlier
> questions.
> 
> On Thu, Aug 23, 2018 at 01:12:15PM +0200, Laurent Pinchart wrote:
> > Could you please share numbers, ideally when run in kernel space ?
> 
> Can you explain the benefits of profiling this inside the kernel rather
> than outside? Doing it outside is much simpler, so I'd like to
> understand your reasons. The next question is what kind of system to
> profile on. I guess doing it on an X86 will not help. Typical users will
> use some arm CPU and integer division on arm is slower than on x86. Is a
> Cortex A9 ok?
> 
> If you can provide more relevant inputs, that'd also help. I was able to
> derive an example input from the dt bindings for mt9p031 (ext=6MHz,
> pix=96MHz) and also used random inputs for testing. Getting more
> real-world inputs would help in producing a useful benchmark.
> 
> > This patch is very hard to review as you rewrite the whole, removing all the 
> > documentation for the existing algorithm, without documenting the new one. 
> 
> That's not entirely fair. Unlike the original algorithm, I've split it
> to multiple functions and unlike the original algorithm, I've documented
> pre- and post-conditions. In a sense, that's actually more
> documentation. At least, you now see what the functions are supposed to
> be doing.
> 
> Inside the alogrithm, I've often writting which precondition
> (in)equation I used in which particular step.
> 
> The variable naming choice makes it very clear that the first step is
> reducing the value ranges for n, m, and p1 as much as possible before
> proceeding to an actual parameter computation.
> 
> There was little choice in removing much of the old algorithm as the
> approach of using gcd is numerically unstable. I actually kept
> everything until the first gcd computation.
> 
> > There's also no example of a failing case with the existing code that works 
> > with yours.
> 
> That is an unfortunate omission. I should have thought of this and
> should have given it upfront. I'm sorry.
> 
> Take for instance MT9M024. The data sheet
> (http://www.mouser.com/ds/2/308/MT9M024-D-606228.pdf) allows deducing
> the following limits:
> 
> 	const struct aptina_pll_limits mt9m024_limits = {
> 		.ext_clock_min = 6000000,
> 		.ext_clock_max = 50000000,
> 		.int_clock_min = 2000000,
> 		.int_clock_max = 24000000,
> 		.out_clock_min = 384000000,
> 		.out_clock_max = 768000000,
> 		.pix_clock_max = 74250000,
> 		.n_min = 1,
> 		.n_max = 63,
> 		.m_min = 32,
> 		.m_max = 255,
> 		.p1_min = 4,
> 		.p1_max = 16,
> 	};
> 
> Now if you choose ext_clock and pix_clock maximal within the given
> limits, the existing aptina_pll_calculate gives up. Lowering the
> pix_clock does not help either. Even down to 73 MHz, it is unable to
> find any pll configuration.
> 
> The new algorithm finds a solution (n=11, m=98, p1=6) with 7.5 KHz
> error. Incidentally, that solution is close to the one given by the
> vendor tool (n=22, m=196, p1=6).

These values don't seem valid for 6 MHz --- the frequency after the PLL is
less than 384 MHz. Did you use a different external clock frequency?

> 
> > Could you please document the algorithm in details (especially explaining the 
> > background ideas), and share at least one use case, with with numbers for all 
> > the input and output parameters ?
> 
> I'll try to improve the documentation in the next version. Summarizing
> the idea is something I can do, but beyond that I don't see much to add
> beyond prose.
> 
> Ahead of posting a V2, let me suggest this:
> 
> /* The first part of the algorithm reduces the given aptina_pll_limits
>  * for n, m and p1 using the (in)equalities and the concrete values for
>  * ext_clock and pix_clock in order to reduce the search space.
>  *
>  * The main loop iterates over all remaining possible p1 values and
>  * computes the necessary out_clock frequency. The ext_clock / out_clock
>  * ratio is then approximated with n / m within their respective bounds.
>  * For each parameter choice, the preconditions must be rechecked,
>  * because integer rounding errors may result in violating some of the
>  * preconditions. The parameter set with the least frequency error is
>  * returned.
>  */
> 
> Is this what you are looking for?
> 
> Helmut

-- 
Sakari Ailus
e-mail: sakari.ailus@iki.fi

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] media: aptina-pll: allow approximating the requested pix_clock
  2018-08-25 11:32           ` Sakari Ailus
@ 2018-08-27  8:44             ` Helmut Grohne
  0 siblings, 0 replies; 11+ messages in thread
From: Helmut Grohne @ 2018-08-27  8:44 UTC (permalink / raw)
  To: Sakari Ailus; +Cc: Laurent Pinchart, Mauro Carvalho Chehab, linux-media

On Sat, Aug 25, 2018 at 01:32:47PM +0200, Sakari Ailus wrote:
> On Fri, Aug 24, 2018 at 02:05:17PM +0200, Helmut Grohne wrote:
> > Take for instance MT9M024. The data sheet
> > (http://www.mouser.com/ds/2/308/MT9M024-D-606228.pdf) allows deducing
> > the following limits:
> > 
> > 	const struct aptina_pll_limits mt9m024_limits = {
> > 		.ext_clock_min = 6000000,
> > 		.ext_clock_max = 50000000,
> > 		.int_clock_min = 2000000,
> > 		.int_clock_max = 24000000,
> > 		.out_clock_min = 384000000,
> > 		.out_clock_max = 768000000,
> > 		.pix_clock_max = 74250000,
> > 		.n_min = 1,
> > 		.n_max = 63,
> > 		.m_min = 32,
> > 		.m_max = 255,
> > 		.p1_min = 4,
> > 		.p1_max = 16,
> > 	};
> > 
> > Now if you choose ext_clock and pix_clock maximal within the given
> > limits, the existing aptina_pll_calculate gives up. Lowering the
> > pix_clock does not help either. Even down to 73 MHz, it is unable to
> > find any pll configuration.
> > 
> > The new algorithm finds a solution (n=11, m=98, p1=6) with 7.5 KHz
> > error. Incidentally, that solution is close to the one given by the
> > vendor tool (n=22, m=196, p1=6).
> 
> These values don't seem valid for 6 MHz --- the frequency after the PLL is
> less than 384 MHz. Did you use a different external clock frequency?

I wrote that I used the maximal external clock frequency, which is 50
MHz. For that value, the output clock is within the requested bounds.
Are you implying that the chosen pll parameters should be valid for all
possible external clocks simultaneously?

Helmut

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2018-08-27 12:32 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-14  6:35 why does aptina_pll_calculate insist on exact division? Helmut Grohne
2018-08-14  7:30 ` Laurent Pinchart
2018-08-14  8:40   ` Helmut Grohne
2018-08-23  7:52     ` [PATCH] media: aptina-pll: allow approximating the requested pix_clock Helmut Grohne
2018-08-23 11:12       ` Laurent Pinchart
2018-08-24 12:05         ` Helmut Grohne
2018-08-25 11:32           ` Sakari Ailus
2018-08-27  8:44             ` Helmut Grohne
2018-08-23 11:30       ` Sakari Ailus
2018-08-24 12:05         ` Helmut Grohne
2018-08-14  8:48   ` why does aptina_pll_calculate insist on exact division? Sakari Ailus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.