linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/3] spi: dts: sun4i: Add support for wait time between word transmissions
@ 2015-12-26 15:53 Marcus Weseloh
  2015-12-26 15:53 ` [PATCH v6 1/3] spi: dts: Add new device property to specifcy a " Marcus Weseloh
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Marcus Weseloh @ 2015-12-26 15:53 UTC (permalink / raw)
  To: linux-sunxi
  Cc: Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	linux-arm-kernel, linux-kernel, linux-spi, Marcus Weseloh,
	Mark Brown, Mark Rutland, Maxime Ripard, Pawel Moll, Rob Herring

Hi all,

This is the sixth version of the patch set that adds a new property
"spi-word-wait-ns" to the spi-bus binding to allow SPI slave devices to set
a wait time between the transmission of words. It modifies the spi_device
struct and slave device probing to read and store the new property.

Also modifies the sun4i SPI master driver to make use of the new property.
This specific SPI controller needs 3 clock cycles to set up the delay, which
makes the minimum non-zero wait time on this hardware 4 clock cycles.

It now also fixes multiple problems in the sun4i clock calculation: 
 - The A10/A20 datasheet contains the formula AHB_CLK / (2^(n+1)) to calculate
   SPI_CLK from CDR1, but this formula is wrong. The actual formula -
   determined by analyzing the actual waveforms on a A20 SoC - is AHB_CLK /
   (2^n).

 - The divisor calculations for CDR1 and CDR2 both rounded to the nearest
   integer. This could lead to a transfer speed that is higher than the
   requested speed. This patch changes both calculations to always round down.

 - The mclk frequency was only ever increased, never decreased. This could
   lead to unpredictable transfer speeds, depending on the order in which
   transfers with different speeds where serviced by the SPI driver.


Changes from v1:
 * renamed the property for more clarity
 * wait time is set in nanoseconds instead of number of clock cycles
 * transparently handle the 3 setup clock cycles

Changes from v2:
 * fixed typo in comment
 * moved parameter to spi-bus binding, dropping the vendor prefix
 * changed commit summary and description to reflect the changes

Changes from v3:
 * remove reference to "hardware" in comments and description, as the wait
   time could also be implemented in software
 * read and set property value in spi core

Changes from v4:
 * log with dev_dbg instead of dev_info
 * split patch into two separate ones for SPI-core and sun4i parts

Changes from v5:
 * Add Maxime's Reviewed-by and Rob's Acked-by to the SPI core patch
 * Add patch to fix clock calculation
 * Use actual SPI speed instead of tfr->speed_hz for wait time calculation


Marcus Weseloh (3):
  spi: dts: Add new device property to specifcy a wait time between word
    transmissions
  spi: sun4i: Fix clock calculations to be predictable and never exceed
    the requested rate
  spi: sun4i: Add support for wait time between word transmissions

 Documentation/devicetree/bindings/spi/spi-bus.txt |  2 +
 drivers/spi/spi-sun4i.c                           | 51 ++++++++++++++++++-----
 drivers/spi/spi.c                                 |  2 +
 include/linux/spi/spi.h                           |  2 +
 4 files changed, 46 insertions(+), 11 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v6 1/3] spi: dts: Add new device property to specifcy a wait time between word transmissions
  2015-12-26 15:53 [PATCH v6 0/3] spi: dts: sun4i: Add support for wait time between word transmissions Marcus Weseloh
@ 2015-12-26 15:53 ` Marcus Weseloh
  2015-12-26 15:53 ` [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate Marcus Weseloh
  2015-12-26 15:53 ` [PATCH v6 3/3] spi: sun4i: Add support for wait time between word transmissions Marcus Weseloh
  2 siblings, 0 replies; 15+ messages in thread
From: Marcus Weseloh @ 2015-12-26 15:53 UTC (permalink / raw)
  To: linux-sunxi
  Cc: Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	linux-arm-kernel, linux-kernel, linux-spi, Marcus Weseloh,
	Mark Brown, Mark Rutland, Maxime Ripard, Pawel Moll, Rob Herring

Adds a new property "spi-word-wait-ns" to the spi-bus binding that allows
SPI slave devices to set a wait time between the transmission of words.

Signed-off-by: Marcus Weseloh <mweseloh42@gmail.com>
Reviewed-by: Maxime Ripard <maxime.ripard@free-electrons.com>
Acked-by: Rob Herring <robh@kernel.org>
---
 Documentation/devicetree/bindings/spi/spi-bus.txt | 2 ++
 drivers/spi/spi.c                                 | 2 ++
 include/linux/spi/spi.h                           | 2 ++
 3 files changed, 6 insertions(+)

diff --git a/Documentation/devicetree/bindings/spi/spi-bus.txt b/Documentation/devicetree/bindings/spi/spi-bus.txt
index bbaa857..434d321 100644
--- a/Documentation/devicetree/bindings/spi/spi-bus.txt
+++ b/Documentation/devicetree/bindings/spi/spi-bus.txt
@@ -61,6 +61,8 @@ contain the following properties.
                       used for MOSI. Defaults to 1 if not present.
 - spi-rx-bus-width - (optional) The bus width(number of data wires) that
                       used for MISO. Defaults to 1 if not present.
+- spi-word-wait-ns - (optional) Delay between transmission of words
+                      in nanoseconds
 
 Some SPI controllers and devices support Dual and Quad SPI transfer mode.
 It allows data in the SPI system to be transferred in 2 wires(DUAL) or 4 wires(QUAD).
diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
index 2b0a8ec..186373b 100644
--- a/drivers/spi/spi.c
+++ b/drivers/spi/spi.c
@@ -1467,6 +1467,8 @@ of_register_spi_device(struct spi_master *master, struct device_node *nc)
 	if (of_find_property(nc, "spi-lsb-first", NULL))
 		spi->mode |= SPI_LSB_FIRST;
 
+	of_property_read_u32(nc, "spi-word-wait-ns", &spi->word_wait_ns);
+
 	/* Device DUAL/QUAD mode */
 	if (!of_property_read_u32(nc, "spi-tx-bus-width", &value)) {
 		switch (value) {
diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index cce80e6..ea3037f 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -118,6 +118,7 @@ void spi_statistics_add_transfer_stats(struct spi_statistics *stats,
  *	for driver coldplugging, and in uevents used for hotplugging
  * @cs_gpio: gpio number of the chipselect line (optional, -ENOENT when
  *	when not using a GPIO line)
+ * @word_wait_ns: A wait time between word transfers in nanoseconds
  *
  * @statistics: statistics for the spi_device
  *
@@ -158,6 +159,7 @@ struct spi_device {
 	void			*controller_data;
 	char			modalias[SPI_NAME_SIZE];
 	int			cs_gpio;	/* chip select gpio */
+	u32			word_wait_ns;	/* wait time between words */
 
 	/* the statistics */
 	struct spi_statistics	statistics;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2015-12-26 15:53 [PATCH v6 0/3] spi: dts: sun4i: Add support for wait time between word transmissions Marcus Weseloh
  2015-12-26 15:53 ` [PATCH v6 1/3] spi: dts: Add new device property to specifcy a " Marcus Weseloh
@ 2015-12-26 15:53 ` Marcus Weseloh
  2015-12-27 21:09   ` Maxime Ripard
  2015-12-26 15:53 ` [PATCH v6 3/3] spi: sun4i: Add support for wait time between word transmissions Marcus Weseloh
  2 siblings, 1 reply; 15+ messages in thread
From: Marcus Weseloh @ 2015-12-26 15:53 UTC (permalink / raw)
  To: linux-sunxi
  Cc: Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	linux-arm-kernel, linux-kernel, linux-spi, Marcus Weseloh,
	Mark Brown, Mark Rutland, Maxime Ripard, Pawel Moll, Rob Herring

This patch fixes multiple problems with the current clock calculations:

1. The A10/A20 datasheet contains the formula AHB_CLK / (2^(n+1)) to
calculate SPI_CLK from CDR1, but this formula is wrong. The actual
formula - determined by analyzing the actual waveforms - is
AHB_CLK / (2^n).

2. The divisor calculations for CDR1 and CDR2 both rounded to the
nearest integer. This could lead to a transfer speed that is higher than
the requested speed. This patch changes both calculations to always
round down.

3. The mclk frequency was only ever increased, never decreased. This could
lead to unpredictable transfer speeds, depending on the order in which
transfers with different speeds where serviced by the SPI driver.

Signed-off-by: Marcus Weseloh <mweseloh42@gmail.com>
---
 drivers/spi/spi-sun4i.c | 26 ++++++++++++++++----------
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/drivers/spi/spi-sun4i.c b/drivers/spi/spi-sun4i.c
index f60a6d6..d67e142 100644
--- a/drivers/spi/spi-sun4i.c
+++ b/drivers/spi/spi-sun4i.c
@@ -79,6 +79,9 @@ struct sun4i_spi {
 	struct clk		*hclk;
 	struct clk		*mclk;
 
+	int			cur_max_speed;
+	int			cur_mclk;
+
 	struct completion	done;
 
 	const u8		*tx_buf;
@@ -227,11 +230,17 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
 
 	sun4i_spi_write(sspi, SUN4I_CTL_REG, reg);
 
-	/* Ensure that we have a parent clock fast enough */
+	/*
+	 * Ensure that the parent clock is set to twice the max speed
+	 * of the spi device (possibly rounded up by the clk driver)
+	 */
 	mclk_rate = clk_get_rate(sspi->mclk);
-	if (mclk_rate < (2 * tfr->speed_hz)) {
-		clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
+	if (spi->max_speed_hz != sspi->cur_max_speed ||
+	    mclk_rate != sspi->cur_mclk) {
+		clk_set_rate(sspi->mclk, 2 * spi->max_speed_hz);
 		mclk_rate = clk_get_rate(sspi->mclk);
+		sspi->cur_mclk = mclk_rate;
+		sspi->cur_max_speed = spi->max_speed_hz;
 	}
 
 	/*
@@ -239,7 +248,7 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
 	 *
 	 * We have two choices there. Either we can use the clock
 	 * divide rate 1, which is calculated thanks to this formula:
-	 * SPI_CLK = MOD_CLK / (2 ^ (cdr + 1))
+	 * SPI_CLK = MOD_CLK / (2 ^ cdr)
 	 * Or we can use CDR2, which is calculated with the formula:
 	 * SPI_CLK = MOD_CLK / (2 * (cdr + 1))
 	 * Wether we use the former or the latter is set through the
@@ -248,14 +257,11 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
 	 * First try CDR2, and if we can't reach the expected
 	 * frequency, fall back to CDR1.
 	 */
-	div = mclk_rate / (2 * tfr->speed_hz);
-	if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
-		if (div > 0)
-			div--;
-
+	div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
+	if (div <= SUN4I_CLK_CTL_CDR2_MASK) {
 		reg = SUN4I_CLK_CTL_CDR2(div) | SUN4I_CLK_CTL_DRS;
 	} else {
-		div = ilog2(mclk_rate) - ilog2(tfr->speed_hz);
+		div = ilog2(roundup_pow_of_two(mclk_rate / tfr->speed_hz));
 		reg = SUN4I_CLK_CTL_CDR1(div);
 	}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v6 3/3] spi: sun4i: Add support for wait time between word transmissions
  2015-12-26 15:53 [PATCH v6 0/3] spi: dts: sun4i: Add support for wait time between word transmissions Marcus Weseloh
  2015-12-26 15:53 ` [PATCH v6 1/3] spi: dts: Add new device property to specifcy a " Marcus Weseloh
  2015-12-26 15:53 ` [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate Marcus Weseloh
@ 2015-12-26 15:53 ` Marcus Weseloh
  2015-12-27 21:12   ` Maxime Ripard
  2 siblings, 1 reply; 15+ messages in thread
From: Marcus Weseloh @ 2015-12-26 15:53 UTC (permalink / raw)
  To: linux-sunxi
  Cc: Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	linux-arm-kernel, linux-kernel, linux-spi, Marcus Weseloh,
	Mark Brown, Mark Rutland, Maxime Ripard, Pawel Moll, Rob Herring

Modifies the sun4i SPI master driver to make use of the
"spi-word-wait-ns" property. This specific SPI controller needs 3 clock
cycles to set up the delay, which makes the minimum non-zero wait time
on this hardware 4 clock cycles.

Signed-off-by: Marcus Weseloh <mweseloh42@gmail.com>
---
 drivers/spi/spi-sun4i.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/drivers/spi/spi-sun4i.c b/drivers/spi/spi-sun4i.c
index d67e142..4d6f77c 100644
--- a/drivers/spi/spi-sun4i.c
+++ b/drivers/spi/spi-sun4i.c
@@ -176,6 +176,9 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
 	unsigned int tx_len = 0;
 	int ret = 0;
 	u32 reg;
+	int wait_clk = 0;
+	int clk_ns = 0;
+	unsigned int speed_hz;
 
 	/* We don't support transfer larger than the FIFO */
 	if (tfr->len > SUN4I_FIFO_DEPTH)
@@ -260,13 +263,34 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
 	div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
 	if (div <= SUN4I_CLK_CTL_CDR2_MASK) {
 		reg = SUN4I_CLK_CTL_CDR2(div) | SUN4I_CLK_CTL_DRS;
+		speed_hz = mclk_rate / (2 * (div + 1));
 	} else {
 		div = ilog2(roundup_pow_of_two(mclk_rate / tfr->speed_hz));
 		reg = SUN4I_CLK_CTL_CDR1(div);
+		speed_hz = mclk_rate / (1 << div);
 	}
 
 	sun4i_spi_write(sspi, SUN4I_CLK_CTL_REG, reg);
 
+	/*
+	 * Setup wait time between words.
+	 *
+	 * Wait time is set in SPI_CLK cycles. The SPI hardware needs 3
+	 * additional cycles to setup the wait counter, so the minimum delay
+	 * time is 4 cycles.
+	 */
+	if (spi->word_wait_ns) {
+		clk_ns = DIV_ROUND_UP(1000000000, speed_hz);
+		wait_clk = DIV_ROUND_UP(spi->word_wait_ns, clk_ns) - 3;
+		if (wait_clk < 1) {
+			wait_clk = 1;
+			dev_dbg(&spi->dev,
+				"using minimum of 4 word wait cycles (%uns)",
+				4 * clk_ns);
+		}
+	}
+	sun4i_spi_write(sspi, SUN4I_WAIT_REG, (u16)wait_clk);
+
 	/* Setup the transfer now... */
 	if (sspi->tx_buf)
 		tx_len = tfr->len;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2015-12-26 15:53 ` [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate Marcus Weseloh
@ 2015-12-27 21:09   ` Maxime Ripard
  2015-12-27 23:29     ` Marcus Weseloh
  0 siblings, 1 reply; 15+ messages in thread
From: Maxime Ripard @ 2015-12-27 21:09 UTC (permalink / raw)
  To: Marcus Weseloh
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	linux-arm-kernel, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 3416 bytes --]

On Sat, Dec 26, 2015 at 04:53:05PM +0100, Marcus Weseloh wrote:
> This patch fixes multiple problems with the current clock calculations:
> 
> 1. The A10/A20 datasheet contains the formula AHB_CLK / (2^(n+1)) to
> calculate SPI_CLK from CDR1, but this formula is wrong. The actual
> formula - determined by analyzing the actual waveforms - is
> AHB_CLK / (2^n).
> 
> 2. The divisor calculations for CDR1 and CDR2 both rounded to the
> nearest integer. This could lead to a transfer speed that is higher than
> the requested speed. This patch changes both calculations to always
> round down.
> 
> 3. The mclk frequency was only ever increased, never decreased. This could
> lead to unpredictable transfer speeds, depending on the order in which
> transfers with different speeds where serviced by the SPI driver.

These 3 should probably be separate patches (and be Cc'd to stable

> Signed-off-by: Marcus Weseloh <mweseloh42@gmail.com>
> ---
>  drivers/spi/spi-sun4i.c | 26 ++++++++++++++++----------
>  1 file changed, 16 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/spi/spi-sun4i.c b/drivers/spi/spi-sun4i.c
> index f60a6d6..d67e142 100644
> --- a/drivers/spi/spi-sun4i.c
> +++ b/drivers/spi/spi-sun4i.c
> @@ -79,6 +79,9 @@ struct sun4i_spi {
>  	struct clk		*hclk;
>  	struct clk		*mclk;
>  
> +	int			cur_max_speed;
> +	int			cur_mclk;
> +
>  	struct completion	done;
>  
>  	const u8		*tx_buf;
> @@ -227,11 +230,17 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
>  
>  	sun4i_spi_write(sspi, SUN4I_CTL_REG, reg);
>  
> -	/* Ensure that we have a parent clock fast enough */
> +	/*
> +	 * Ensure that the parent clock is set to twice the max speed
> +	 * of the spi device (possibly rounded up by the clk driver)
> +	 */
>  	mclk_rate = clk_get_rate(sspi->mclk);
> -	if (mclk_rate < (2 * tfr->speed_hz)) {
> -		clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
> +	if (spi->max_speed_hz != sspi->cur_max_speed ||
> +	    mclk_rate != sspi->cur_mclk) {

Do you need to cache the values? As far as I'm aware, you end up doing
the computation all the time anyway.

> +		clk_set_rate(sspi->mclk, 2 * spi->max_speed_hz);
>  		mclk_rate = clk_get_rate(sspi->mclk);
> +		sspi->cur_mclk = mclk_rate;
> +		sspi->cur_max_speed = spi->max_speed_hz;
>  	}
>  
>  	/*
> @@ -239,7 +248,7 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
>  	 *
>  	 * We have two choices there. Either we can use the clock
>  	 * divide rate 1, which is calculated thanks to this formula:
> -	 * SPI_CLK = MOD_CLK / (2 ^ (cdr + 1))
> +	 * SPI_CLK = MOD_CLK / (2 ^ cdr)
>
>  	 * Or we can use CDR2, which is calculated with the formula:
>  	 * SPI_CLK = MOD_CLK / (2 * (cdr + 1))
>  	 * Wether we use the former or the latter is set through the
> @@ -248,14 +257,11 @@ static int sun4i_spi_transfer_one(struct spi_master *master,
>  	 * First try CDR2, and if we can't reach the expected
>  	 * frequency, fall back to CDR1.
>  	 */
> -	div = mclk_rate / (2 * tfr->speed_hz);
> -	if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> -		if (div > 0)
> -			div--;
> -
> +	div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;

Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?

Thanks,
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 3/3] spi: sun4i: Add support for wait time between word transmissions
  2015-12-26 15:53 ` [PATCH v6 3/3] spi: sun4i: Add support for wait time between word transmissions Marcus Weseloh
@ 2015-12-27 21:12   ` Maxime Ripard
  0 siblings, 0 replies; 15+ messages in thread
From: Maxime Ripard @ 2015-12-27 21:12 UTC (permalink / raw)
  To: Marcus Weseloh
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	linux-arm-kernel, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 563 bytes --]

On Sat, Dec 26, 2015 at 04:53:06PM +0100, Marcus Weseloh wrote:
> Modifies the sun4i SPI master driver to make use of the
> "spi-word-wait-ns" property. This specific SPI controller needs 3 clock
> cycles to set up the delay, which makes the minimum non-zero wait time
> on this hardware 4 clock cycles.
> 
> Signed-off-by: Marcus Weseloh <mweseloh42@gmail.com>

Acked-by: Maxime Ripard <maxime.ripard@free-electrons.com>

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2015-12-27 21:09   ` Maxime Ripard
@ 2015-12-27 23:29     ` Marcus Weseloh
  2015-12-28 16:22       ` Marcus Weseloh
  2016-01-10 18:14       ` Maxime Ripard
  0 siblings, 2 replies; 15+ messages in thread
From: Marcus Weseloh @ 2015-12-27 23:29 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

Hi,

2015-12-27 22:09 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
> On Sat, Dec 26, 2015 at 04:53:05PM +0100, Marcus Weseloh wrote:
>> This patch fixes multiple problems with the current clock calculations:
>>
>> 1. The A10/A20 datasheet contains the formula AHB_CLK / (2^(n+1)) to
>> calculate SPI_CLK from CDR1, but this formula is wrong. The actual
>> formula - determined by analyzing the actual waveforms - is
>> AHB_CLK / (2^n).
>>
>> 2. The divisor calculations for CDR1 and CDR2 both rounded to the
>> nearest integer. This could lead to a transfer speed that is higher than
>> the requested speed. This patch changes both calculations to always
>> round down.
>>
>> 3. The mclk frequency was only ever increased, never decreased. This could
>> lead to unpredictable transfer speeds, depending on the order in which
>> transfers with different speeds where serviced by the SPI driver.
>
> These 3 should probably be separate patches (and be Cc'd to stable

Will do. But I have the feeling that at least 1. and 2. should be in
the same patch as they touch the same lines of code. Do you think that
would be ok?
And before CC'ing stable, I would love to have someone with access to
A10 hardware and a scope (or even a bus pirate) check that the A10 SPI
controller does indeed have the same "bug". I strongly think so, but
would sleep better if it could be confirmed.

[...]
>> -     /* Ensure that we have a parent clock fast enough */
>> +     /*
>> +      * Ensure that the parent clock is set to twice the max speed
>> +      * of the spi device (possibly rounded up by the clk driver)
>> +      */
>>       mclk_rate = clk_get_rate(sspi->mclk);
>> -     if (mclk_rate < (2 * tfr->speed_hz)) {
>> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
>> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
>> +         mclk_rate != sspi->cur_mclk) {
>
> Do you need to cache the values? As far as I'm aware, you end up doing
> the computation all the time anyway.

By caching the values we optimize the case when a single SPI slave
device (or even multiple slave devices with the same max_speed) are
used multiple times in a row. In that case, we're saving two calls:
clk_set_rate and clk_get_rate. I was unsure about how expensive the
clk_* calls were, so I thought it would be safer use caching. But
maybe it's not worth the extra code?

Oh, and I just noticed a mistake in the comment: the clock driver
rounds up _or_ down, so I should drop the "up".

[...]
>> -     div = mclk_rate / (2 * tfr->speed_hz);
>> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
>> -             if (div > 0)
>> -                     div--;
>> -
>> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
>
> Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?

It is quite often, but not in all cases. The plain division rounds to
the nearest integer, so it rounds down sometimes. Consider the
following case: we have a slow SPI device with a spi-max-frequency of
50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
sets mclk to 214,285Hz.

Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
1 as the old code subtracts 1 two lines further down
The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000)  =
3, so div = 2 if we add the -1
We end up with a SPI_CLK of 53,571Hz using the old calculation and
35,714Hz using the new one. The old SPI_CLK is obviously closer to the
requested speed, but nevertheless it exceeds the requested limit and
should not have been chosen.

Thanks for the review!

Cheers,

   Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2015-12-27 23:29     ` Marcus Weseloh
@ 2015-12-28 16:22       ` Marcus Weseloh
  2016-01-10 19:44         ` Maxime Ripard
  2016-01-10 18:14       ` Maxime Ripard
  1 sibling, 1 reply; 15+ messages in thread
From: Marcus Weseloh @ 2015-12-28 16:22 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

Hi again,

2015-12-28 0:29 GMT+01:00 Marcus Weseloh <mweseloh42@gmail.com>:
> 2015-12-27 22:09 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
[...]
> [...]
>>> -     /* Ensure that we have a parent clock fast enough */
>>> +     /*
>>> +      * Ensure that the parent clock is set to twice the max speed
>>> +      * of the spi device (possibly rounded up by the clk driver)
>>> +      */
>>>       mclk_rate = clk_get_rate(sspi->mclk);
>>> -     if (mclk_rate < (2 * tfr->speed_hz)) {
>>> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
>>> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
>>> +         mclk_rate != sspi->cur_mclk) {
>>
>> Do you need to cache the values? As far as I'm aware, you end up doing
>> the computation all the time anyway.
>
> By caching the values we optimize the case when a single SPI slave
> device (or even multiple slave devices with the same max_speed) are
> used multiple times in a row. In that case, we're saving two calls:
> clk_set_rate and clk_get_rate. I was unsure about how expensive the
> clk_* calls were, so I thought it would be safer use caching. But
> maybe it's not worth the extra code?
>
> Oh, and I just noticed a mistake in the comment: the clock driver
> rounds up _or_ down, so I should drop the "up".

As I'm looking further into this, I think the comment should read
"possibly rounded down", as the clk framework is expected to set a
frequency that is less or equal to the requested frequency. So the
effect I was seeing (that I got a frequency higher than the requested
one) is actually a bug!? Further details below.

> [...]
>>> -     div = mclk_rate / (2 * tfr->speed_hz);
>>> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
>>> -             if (div > 0)
>>> -                     div--;
>>> -
>>> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
>>
>> Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
>
> It is quite often, but not in all cases. The plain division rounds to
> the nearest integer, so it rounds down sometimes. Consider the
> following case: we have a slow SPI device with a spi-max-frequency of
> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
> sets mclk to 214,285Hz.

That clk_set_rate sets a higher frequency than requested seems to be a
problem in itself. I had a look at clk/sunxi/clk-mod0.c and noticed a
few small problems there. Will post an RFC patch in a couple of
minutes.

That doesn't invalidate any of the fixes proposed in this patch,
though. They are still needed, as I see it. But after fixing
clk-mod0.c, we need to make further changes to the spi-sun4i clock
selection, as clk_set_rate could now return -EINVAL. Will amend this
patch as well after receiving feedback on the (soon to come) mod0 clk
patch.

Cheers,

  Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2015-12-27 23:29     ` Marcus Weseloh
  2015-12-28 16:22       ` Marcus Weseloh
@ 2016-01-10 18:14       ` Maxime Ripard
  2016-01-10 21:11         ` Marcus Weseloh
  1 sibling, 1 reply; 15+ messages in thread
From: Maxime Ripard @ 2016-01-10 18:14 UTC (permalink / raw)
  To: Marcus Weseloh
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 4357 bytes --]

Hi,

On Mon, Dec 28, 2015 at 12:29:16AM +0100, Marcus Weseloh wrote:
> Hi,
> 
> 2015-12-27 22:09 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
> > On Sat, Dec 26, 2015 at 04:53:05PM +0100, Marcus Weseloh wrote:
> >> This patch fixes multiple problems with the current clock calculations:
> >>
> >> 1. The A10/A20 datasheet contains the formula AHB_CLK / (2^(n+1)) to
> >> calculate SPI_CLK from CDR1, but this formula is wrong. The actual
> >> formula - determined by analyzing the actual waveforms - is
> >> AHB_CLK / (2^n).
> >>
> >> 2. The divisor calculations for CDR1 and CDR2 both rounded to the
> >> nearest integer. This could lead to a transfer speed that is higher than
> >> the requested speed. This patch changes both calculations to always
> >> round down.
> >>
> >> 3. The mclk frequency was only ever increased, never decreased. This could
> >> lead to unpredictable transfer speeds, depending on the order in which
> >> transfers with different speeds where serviced by the SPI driver.
> >
> > These 3 should probably be separate patches (and be Cc'd to stable
> 
> Will do. But I have the feeling that at least 1. and 2. should be in
> the same patch as they touch the same lines of code. Do you think that
> would be ok?

It can also be two subsequent patches that are part of the same serie.

> And before CC'ing stable, I would love to have someone with access to
> A10 hardware and a scope (or even a bus pirate) check that the A10 SPI
> controller does indeed have the same "bug". I strongly think so, but
> would sleep better if it could be confirmed.

We never noticed any significant difference between the two. By now,
if there was any, we probably would be aware of it. And if there's
any, we can always send a subsequent patch.

> >> -     /* Ensure that we have a parent clock fast enough */
> >> +     /*
> >> +      * Ensure that the parent clock is set to twice the max speed
> >> +      * of the spi device (possibly rounded up by the clk driver)
> >> +      */
> >>       mclk_rate = clk_get_rate(sspi->mclk);
> >> -     if (mclk_rate < (2 * tfr->speed_hz)) {
> >> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
> >> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
> >> +         mclk_rate != sspi->cur_mclk) {
> >
> > Do you need to cache the values? As far as I'm aware, you end up doing
> > the computation all the time anyway.
> 
> By caching the values we optimize the case when a single SPI slave
> device (or even multiple slave devices with the same max_speed) are
> used multiple times in a row. In that case, we're saving two calls:
> clk_set_rate and clk_get_rate. I was unsure about how expensive the
> clk_* calls were, so I thought it would be safer use caching. But
> maybe it's not worth the extra code?

Unless you can point that there's a significant performance
difference, I'm not sure it's worth it.

> Oh, and I just noticed a mistake in the comment: the clock driver
> rounds up _or_ down, so I should drop the "up".
> 
> [...]
> >> -     div = mclk_rate / (2 * tfr->speed_hz);
> >> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> >> -             if (div > 0)
> >> -                     div--;
> >> -
> >> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
> >
> > Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
> 
> It is quite often, but not in all cases. The plain division rounds to
> the nearest integer, so it rounds down sometimes. Consider the
> following case: we have a slow SPI device with a spi-max-frequency of
> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
> sets mclk to 214,285Hz.
> 
> Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
> 1 as the old code subtracts 1 two lines further down
> The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000)  =
> 3, so div = 2 if we add the -1

Except that you have that extra - 1 after your DIV_ROUND_UP
calculation in the line you add.  so you have to remove 1 from that
line above, and then 1 again when we set the register, which ends up
being the exact same thing, or am I missing something?

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2015-12-28 16:22       ` Marcus Weseloh
@ 2016-01-10 19:44         ` Maxime Ripard
  2016-01-10 21:37           ` Marcus Weseloh
  0 siblings, 1 reply; 15+ messages in thread
From: Maxime Ripard @ 2016-01-10 19:44 UTC (permalink / raw)
  To: Marcus Weseloh
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 3137 bytes --]

On Mon, Dec 28, 2015 at 05:22:35PM +0100, Marcus Weseloh wrote:
> Hi again,
> 
> 2015-12-28 0:29 GMT+01:00 Marcus Weseloh <mweseloh42@gmail.com>:
> > 2015-12-27 22:09 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
> [...]
> > [...]
> >>> -     /* Ensure that we have a parent clock fast enough */
> >>> +     /*
> >>> +      * Ensure that the parent clock is set to twice the max speed
> >>> +      * of the spi device (possibly rounded up by the clk driver)
> >>> +      */
> >>>       mclk_rate = clk_get_rate(sspi->mclk);
> >>> -     if (mclk_rate < (2 * tfr->speed_hz)) {
> >>> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
> >>> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
> >>> +         mclk_rate != sspi->cur_mclk) {
> >>
> >> Do you need to cache the values? As far as I'm aware, you end up doing
> >> the computation all the time anyway.
> >
> > By caching the values we optimize the case when a single SPI slave
> > device (or even multiple slave devices with the same max_speed) are
> > used multiple times in a row. In that case, we're saving two calls:
> > clk_set_rate and clk_get_rate. I was unsure about how expensive the
> > clk_* calls were, so I thought it would be safer use caching. But
> > maybe it's not worth the extra code?
> >
> > Oh, and I just noticed a mistake in the comment: the clock driver
> > rounds up _or_ down, so I should drop the "up".
> 
> As I'm looking further into this, I think the comment should read
> "possibly rounded down", as the clk framework is expected to set a
> frequency that is less or equal to the requested frequency.

AFAIK, there's no such expectation in the clock framework. You
treating this from a maximum frequency perspective, but it would be
the exact opposite if you were talking about a minimum frequency, as
might be the case for other consumers.

> So the effect I was seeing (that I got a frequency higher than the
> requested one) is actually a bug!? Further details below.
> 
> > [...]
> >>> -     div = mclk_rate / (2 * tfr->speed_hz);
> >>> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> >>> -             if (div > 0)
> >>> -                     div--;
> >>> -
> >>> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
> >>
> >> Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
> >
> > It is quite often, but not in all cases. The plain division rounds to
> > the nearest integer, so it rounds down sometimes. Consider the
> > following case: we have a slow SPI device with a spi-max-frequency of
> > 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
> > sets mclk to 214,285Hz.
> 
> That clk_set_rate sets a higher frequency than requested seems to be a
> problem in itself. I had a look at clk/sunxi/clk-mod0.c and noticed a
> few small problems there. Will post an RFC patch in a couple of
> minutes.

Did you post these patches already? I think I missed them if that's
the case.

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2016-01-10 18:14       ` Maxime Ripard
@ 2016-01-10 21:11         ` Marcus Weseloh
  2016-01-17 18:51           ` Maxime Ripard
  0 siblings, 1 reply; 15+ messages in thread
From: Marcus Weseloh @ 2016-01-10 21:11 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

Hi,

2016-01-10 19:14 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
> On Mon, Dec 28, 2015 at 12:29:16AM +0100, Marcus Weseloh wrote:
>> 2015-12-27 22:09 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
>> > On Sat, Dec 26, 2015 at 04:53:05PM +0100, Marcus Weseloh wrote:

>> >> This patch fixes multiple problems with the current clock calculations:
[...]
>> > These 3 should probably be separate patches (and be Cc'd to stable
>>
>> Will do. But I have the feeling that at least 1. and 2. should be in
>> the same patch as they touch the same lines of code. Do you think that
>> would be ok?
>
> It can also be two subsequent patches that are part of the same serie.

OK, will prepare three separate patches in a series for the fixes.

>> And before CC'ing stable, I would love to have someone with access to
>> A10 hardware and a scope (or even a bus pirate) check that the A10 SPI
>> controller does indeed have the same "bug". I strongly think so, but
>> would sleep better if it could be confirmed.
>
> We never noticed any significant difference between the two. By now,
> if there was any, we probably would be aware of it. And if there's
> any, we can always send a subsequent patch.

That's good to know and makes life much easier, thanks!

>> >> -     /* Ensure that we have a parent clock fast enough */
>> >> +     /*
>> >> +      * Ensure that the parent clock is set to twice the max speed
>> >> +      * of the spi device (possibly rounded up by the clk driver)
>> >> +      */
>> >>       mclk_rate = clk_get_rate(sspi->mclk);
>> >> -     if (mclk_rate < (2 * tfr->speed_hz)) {
>> >> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
>> >> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
>> >> +         mclk_rate != sspi->cur_mclk) {
>> >
>> > Do you need to cache the values? As far as I'm aware, you end up doing
>> > the computation all the time anyway.
>>
>> By caching the values we optimize the case when a single SPI slave
>> device (or even multiple slave devices with the same max_speed) are
>> used multiple times in a row. In that case, we're saving two calls:
>> clk_set_rate and clk_get_rate. I was unsure about how expensive the
>> clk_* calls were, so I thought it would be safer use caching. But
>> maybe it's not worth the extra code?
>
> Unless you can point that there's a significant performance
> difference, I'm not sure it's worth it.

I did actually notice a significant transfer latency when a new mod0
clock frequency is set, probably due to the __delay in
drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the
caching is worth it... at least for the case when there are two slave
devices with different transfer speeds accessing the same SPI module.

>> [...]
>> >> -     div = mclk_rate / (2 * tfr->speed_hz);
>> >> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
>> >> -             if (div > 0)
>> >> -                     div--;
>> >> -
>> >> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
>> >
>> > Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
>>
>> It is quite often, but not in all cases. The plain division rounds to
>> the nearest integer, so it rounds down sometimes. Consider the
>> following case: we have a slow SPI device with a spi-max-frequency of
>> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
>> sets mclk to 214,285Hz.
>>
>> Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
>> 1 as the old code subtracts 1 two lines further down
>> The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000)  =
>> 3, so div = 2 if we add the -1
>
> Except that you have that extra - 1 after your DIV_ROUND_UP
> calculation in the line you add.  so you have to remove 1 from that
> line above, and then 1 again when we set the register, which ends up
> being the exact same thing, or am I missing something?

The -1 after the DIV_ROUND_UP is already the -1 that is needed to set
the register. There's no need for another -1 after that (and there
isn't one in the code).

Cheers,

  Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2016-01-10 19:44         ` Maxime Ripard
@ 2016-01-10 21:37           ` Marcus Weseloh
  0 siblings, 0 replies; 15+ messages in thread
From: Marcus Weseloh @ 2016-01-10 21:37 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

Hi,

2016-01-10 20:44 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
> On Mon, Dec 28, 2015 at 05:22:35PM +0100, Marcus Weseloh wrote:
>> 2015-12-28 0:29 GMT+01:00 Marcus Weseloh <mweseloh42@gmail.com>:
>> > 2015-12-27 22:09 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
[...]
>> > Oh, and I just noticed a mistake in the comment: the clock driver
>> > rounds up _or_ down, so I should drop the "up".
>>
>> As I'm looking further into this, I think the comment should read
>> "possibly rounded down", as the clk framework is expected to set a
>> frequency that is less or equal to the requested frequency.
>
> AFAIK, there's no such expectation in the clock framework. You
> treating this from a maximum frequency perspective, but it would be
> the exact opposite if you were talking about a minimum frequency, as
> might be the case for other consumers.

I was looking though the clk driver source and found this comment in
drivers/clk/sunxi/clk-factors.c (Line 89):

/* find the parent that can help provide the fastest rate <= rate */

And that comments seems to be correct, because that is what the code
is doing (and the comment is copied from the core clk framework). So
it checks every parent to see if that parent could be used to get a
clock rate that is less or equal to the requested clock. If it can't
find one, i.e. all parents can only supply a larger clock, then
-EINVAL is returned. Or am I missing something here?

Up to now this driver would never return -EINVAL, because of the
problems in the mod0 clock driver I mentioned. But the intended effect
seems to be actual rate <= requested rate. And it seems like clk
drivers on other platforms do this as well.

[...]
>> That clk_set_rate sets a higher frequency than requested seems to be a
>> problem in itself. I had a look at clk/sunxi/clk-mod0.c and noticed a
>> few small problems there. Will post an RFC patch in a couple of
>> minutes.
>
> Did you post these patches already? I think I missed them if that's
> the case.

Yes, I've posted the patch to the clk-mod0.c here:
https://groups.google.com/d/msg/linux-sunxi/BQWhOHGqliI/fMtirUFsBgAJ

Cheers,

  Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2016-01-10 21:11         ` Marcus Weseloh
@ 2016-01-17 18:51           ` Maxime Ripard
  2016-01-18  9:40             ` Marcus Weseloh
  0 siblings, 1 reply; 15+ messages in thread
From: Maxime Ripard @ 2016-01-17 18:51 UTC (permalink / raw)
  To: Marcus Weseloh
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 3639 bytes --]

Hi,

On Sun, Jan 10, 2016 at 10:11:11PM +0100, Marcus Weseloh wrote:
> >> >> -     /* Ensure that we have a parent clock fast enough */
> >> >> +     /*
> >> >> +      * Ensure that the parent clock is set to twice the max speed
> >> >> +      * of the spi device (possibly rounded up by the clk driver)
> >> >> +      */
> >> >>       mclk_rate = clk_get_rate(sspi->mclk);
> >> >> -     if (mclk_rate < (2 * tfr->speed_hz)) {
> >> >> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
> >> >> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
> >> >> +         mclk_rate != sspi->cur_mclk) {
> >> >
> >> > Do you need to cache the values? As far as I'm aware, you end up doing
> >> > the computation all the time anyway.
> >>
> >> By caching the values we optimize the case when a single SPI slave
> >> device (or even multiple slave devices with the same max_speed) are
> >> used multiple times in a row. In that case, we're saving two calls:
> >> clk_set_rate and clk_get_rate. I was unsure about how expensive the
> >> clk_* calls were, so I thought it would be safer use caching. But
> >> maybe it's not worth the extra code?
> >
> > Unless you can point that there's a significant performance
> > difference, I'm not sure it's worth it.
> 
> I did actually notice a significant transfer latency when a new mod0
> clock frequency is set, probably due to the __delay in
> drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the
> caching is worth it... at least for the case when there are two slave
> devices with different transfer speeds accessing the same SPI module.

I'm not sure we even need that delay in the first place, at least not
for all the clocks using factor.

AFAIK, the only case where it's useful is for PLL, and all of them
have a bit you can busy-wait on that will let you know when the clock
has stabilized.

> >> [...]
> >> >> -     div = mclk_rate / (2 * tfr->speed_hz);
> >> >> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> >> >> -             if (div > 0)
> >> >> -                     div--;
> >> >> -
> >> >> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
> >> >
> >> > Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
> >>
> >> It is quite often, but not in all cases. The plain division rounds to
> >> the nearest integer, so it rounds down sometimes. Consider the
> >> following case: we have a slow SPI device with a spi-max-frequency of
> >> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
> >> sets mclk to 214,285Hz.
> >>
> >> Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
> >> 1 as the old code subtracts 1 two lines further down
> >> The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000)  =
> >> 3, so div = 2 if we add the -1
> >
> > Except that you have that extra - 1 after your DIV_ROUND_UP
> > calculation in the line you add.  so you have to remove 1 from that
> > line above, and then 1 again when we set the register, which ends up
> > being the exact same thing, or am I missing something?
> 
> The -1 after the DIV_ROUND_UP is already the -1 that is needed to set
> the register. There's no need for another -1 after that (and there
> isn't one in the code).

I was probably hallucinating :) My bad.

That being said, I still have a hard time figuring out what would not
be solved simply by removing the div--, which seems to match what your
commit log says.

Thanks!
Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2016-01-17 18:51           ` Maxime Ripard
@ 2016-01-18  9:40             ` Marcus Weseloh
  2016-01-26 21:25               ` Maxime Ripard
  0 siblings, 1 reply; 15+ messages in thread
From: Marcus Weseloh @ 2016-01-18  9:40 UTC (permalink / raw)
  To: Maxime Ripard
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

Hi,

2016-01-17 19:51 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
> On Sun, Jan 10, 2016 at 10:11:11PM +0100, Marcus Weseloh wrote:
>> >> >> -     /* Ensure that we have a parent clock fast enough */
>> >> >> +     /*
>> >> >> +      * Ensure that the parent clock is set to twice the max speed
>> >> >> +      * of the spi device (possibly rounded up by the clk driver)
>> >> >> +      */
>> >> >>       mclk_rate = clk_get_rate(sspi->mclk);
>> >> >> -     if (mclk_rate < (2 * tfr->speed_hz)) {
>> >> >> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
>> >> >> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
>> >> >> +         mclk_rate != sspi->cur_mclk) {
>> >> >
>> >> > Do you need to cache the values? As far as I'm aware, you end up doing
>> >> > the computation all the time anyway.
>> >>
>> >> By caching the values we optimize the case when a single SPI slave
>> >> device (or even multiple slave devices with the same max_speed) are
>> >> used multiple times in a row. In that case, we're saving two calls:
>> >> clk_set_rate and clk_get_rate. I was unsure about how expensive the
>> >> clk_* calls were, so I thought it would be safer use caching. But
>> >> maybe it's not worth the extra code?
>> >
>> > Unless you can point that there's a significant performance
>> > difference, I'm not sure it's worth it.
>>
>> I did actually notice a significant transfer latency when a new mod0
>> clock frequency is set, probably due to the __delay in
>> drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the
>> caching is worth it... at least for the case when there are two slave
>> devices with different transfer speeds accessing the same SPI module.
>
> I'm not sure we even need that delay in the first place, at least not
> for all the clocks using factor.
>
> AFAIK, the only case where it's useful is for PLL, and all of them
> have a bit you can busy-wait on that will let you know when the clock
> has stabilized.

OK, I didn't know that. Does that mean you would like me to drop the
caching from this patch and prepare a patch to remove the __delay from
clk-factors?

>> >> [...]
>> >> >> -     div = mclk_rate / (2 * tfr->speed_hz);
>> >> >> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
>> >> >> -             if (div > 0)
>> >> >> -                     div--;
>> >> >> -
>> >> >> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
>> >> >
>> >> > Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
>> >>
>> >> It is quite often, but not in all cases. The plain division rounds to
>> >> the nearest integer, so it rounds down sometimes. Consider the
>> >> following case: we have a slow SPI device with a spi-max-frequency of
>> >> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
>> >> sets mclk to 214,285Hz.
>> >>
>> >> Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
>> >> 1 as the old code subtracts 1 two lines further down
>> >> The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000)  =
>> >> 3, so div = 2 if we add the -1
>> >
>> > Except that you have that extra - 1 after your DIV_ROUND_UP
>> > calculation in the line you add.  so you have to remove 1 from that
>> > line above, and then 1 again when we set the register, which ends up
>> > being the exact same thing, or am I missing something?
>>
>> The -1 after the DIV_ROUND_UP is already the -1 that is needed to set
>> the register. There's no need for another -1 after that (and there
>> isn't one in the code).
>
> I was probably hallucinating :) My bad.
>
> That being said, I still have a hard time figuring out what would not
> be solved simply by removing the div--, which seems to match what your
> commit log says.

I'm probably not doing a good job explaining the change. Let me try it
with a few examples. Consider the following three approaches:

A (original, unpatched version):
========================
div = mclk_rate / (2 * tfr->speed_hz);
if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
        if (div > 0)
               div--;
} else {
...
}

B (original version, but with div-- removed):
=================================
div = mclk_rate / (2 * tfr->speed_hz);
if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
   ...
} else {
...
}

C (version after this patch):
=====================
div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
if (div <= SUN4I_CLK_CTL_CDR2_MASK) {
   ...
} else {
   ...
}

And now the following values for mclk, tfr->speed and the resulting
div and SPI_CLK

tfr->speed_hz = 50,000
mclk = 214,285
A: div = 1, SPI_CLK = 53,571(!)
B: div = 2, SPI_CLK = 35,714
C: div = 2, SPI_CLK = 35,714

tfr->speed_hz = 50,000
mclk = 200,000
A: div = 1, SPI_CLK = 50,000
B: div = 2, SPI_CLK = 33,333(!)
C: div = 1, SPI_CLK = 50,000

tfr->speed_hz = 650,000
mclk = 1,6000,000
A: div = 11, SPI_CLK = 666,667(!)
B: div = 12, SPI_CLK = 615,385
C: div = 12, SPI_CLK = 615,385

tfr->speed_hz = 1,000,000
mclk = 2,000,000
A: div = 0, SPI_CLK = 1,000,000
B: div = 1, SPI_CLK = 500,000(!)
C: div = 0, SPI_CLK = 1,000,000

I hope that makes it a little bit easier to understand the change. Of
course, the change only makes sense if you agree that the acutal SPI
transfer speed should never exceed the requested speed. Which I think
is the right approach... but maybe you think otherwise?

Thanks for taking the time to look at this so carefully!

Marcus

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate
  2016-01-18  9:40             ` Marcus Weseloh
@ 2016-01-26 21:25               ` Maxime Ripard
  0 siblings, 0 replies; 15+ messages in thread
From: Maxime Ripard @ 2016-01-26 21:25 UTC (permalink / raw)
  To: Marcus Weseloh
  Cc: linux-sunxi, Chen-Yu Tsai, devicetree, Ian Campbell, Kumar Gala,
	Mailing List, Arm, linux-kernel, linux-spi, Mark Brown,
	Mark Rutland, Pawel Moll, Rob Herring

[-- Attachment #1: Type: text/plain, Size: 6490 bytes --]

On Mon, Jan 18, 2016 at 10:40:59AM +0100, Marcus Weseloh wrote:
> Hi,
> 
> 2016-01-17 19:51 GMT+01:00 Maxime Ripard <maxime.ripard@free-electrons.com>:
> > On Sun, Jan 10, 2016 at 10:11:11PM +0100, Marcus Weseloh wrote:
> >> >> >> -     /* Ensure that we have a parent clock fast enough */
> >> >> >> +     /*
> >> >> >> +      * Ensure that the parent clock is set to twice the max speed
> >> >> >> +      * of the spi device (possibly rounded up by the clk driver)
> >> >> >> +      */
> >> >> >>       mclk_rate = clk_get_rate(sspi->mclk);
> >> >> >> -     if (mclk_rate < (2 * tfr->speed_hz)) {
> >> >> >> -             clk_set_rate(sspi->mclk, 2 * tfr->speed_hz);
> >> >> >> +     if (spi->max_speed_hz != sspi->cur_max_speed ||
> >> >> >> +         mclk_rate != sspi->cur_mclk) {
> >> >> >
> >> >> > Do you need to cache the values? As far as I'm aware, you end up doing
> >> >> > the computation all the time anyway.
> >> >>
> >> >> By caching the values we optimize the case when a single SPI slave
> >> >> device (or even multiple slave devices with the same max_speed) are
> >> >> used multiple times in a row. In that case, we're saving two calls:
> >> >> clk_set_rate and clk_get_rate. I was unsure about how expensive the
> >> >> clk_* calls were, so I thought it would be safer use caching. But
> >> >> maybe it's not worth the extra code?
> >> >
> >> > Unless you can point that there's a significant performance
> >> > difference, I'm not sure it's worth it.
> >>
> >> I did actually notice a significant transfer latency when a new mod0
> >> clock frequency is set, probably due to the __delay in
> >> drivers/clk/sunxi/clk-factors.c (line 147). So my feeling is that the
> >> caching is worth it... at least for the case when there are two slave
> >> devices with different transfer speeds accessing the same SPI module.
> >
> > I'm not sure we even need that delay in the first place, at least not
> > for all the clocks using factor.
> >
> > AFAIK, the only case where it's useful is for PLL, and all of them
> > have a bit you can busy-wait on that will let you know when the clock
> > has stabilized.
> 
> OK, I didn't know that.

Look for the lock bit in the PLLs section in the datasheet

> Does that mean you would like me to drop the caching from this patch
> and prepare a patch to remove the __delay from clk-factors?

Yeah, you could add a lock bit field to the clk_factors structure,
busy-wait on it. Otherwise, just go on (hence removing the delay for
those).

Be aware that it might conflict with the clk-factors reworking Chen-Yu
posted yesterday, so you might want to synchronise with him.

> >> >> [...]
> >> >> >> -     div = mclk_rate / (2 * tfr->speed_hz);
> >> >> >> -     if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
> >> >> >> -             if (div > 0)
> >> >> >> -                     div--;
> >> >> >> -
> >> >> >> +     div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
> >> >> >
> >> >> > Isn't it exactly the same thing as mclk_rate / (2 * tfr->speed_hz) ?
> >> >>
> >> >> It is quite often, but not in all cases. The plain division rounds to
> >> >> the nearest integer, so it rounds down sometimes. Consider the
> >> >> following case: we have a slow SPI device with a spi-max-frequency of
> >> >> 50kHz. Our clock driver can't find a clock as slow as 100kHz, so it
> >> >> sets mclk to 214,285Hz.
> >> >>
> >> >> Using the old calculation we get: 214,285 / (2 * 50,000) = 2, so div =
> >> >> 1 as the old code subtracts 1 two lines further down
> >> >> The new calculation results in: DIV_ROUND_UP(214,285, 2 * 50,000)  =
> >> >> 3, so div = 2 if we add the -1
> >> >
> >> > Except that you have that extra - 1 after your DIV_ROUND_UP
> >> > calculation in the line you add.  so you have to remove 1 from that
> >> > line above, and then 1 again when we set the register, which ends up
> >> > being the exact same thing, or am I missing something?
> >>
> >> The -1 after the DIV_ROUND_UP is already the -1 that is needed to set
> >> the register. There's no need for another -1 after that (and there
> >> isn't one in the code).
> >
> > I was probably hallucinating :) My bad.
> >
> > That being said, I still have a hard time figuring out what would not
> > be solved simply by removing the div--, which seems to match what your
> > commit log says.
> 
> I'm probably not doing a good job explaining the change. Let me try it
> with a few examples. Consider the following three approaches:
> 
> A (original, unpatched version):
> ========================
> div = mclk_rate / (2 * tfr->speed_hz);
> if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
>         if (div > 0)
>                div--;
> } else {
> ...
> }
> 
> B (original version, but with div-- removed):
> =================================
> div = mclk_rate / (2 * tfr->speed_hz);
> if (div <= (SUN4I_CLK_CTL_CDR2_MASK + 1)) {
>    ...
> } else {
> ...
> }
> 
> C (version after this patch):
> =====================
> div = DIV_ROUND_UP(mclk_rate, 2 * tfr->speed_hz) - 1;
> if (div <= SUN4I_CLK_CTL_CDR2_MASK) {

Ah yes, you're removing the +1 from here too. Sorry for missing
that obvious change....

>    ...
> } else {
>    ...
> }
> 
> And now the following values for mclk, tfr->speed and the resulting
> div and SPI_CLK
> 
> tfr->speed_hz = 50,000
> mclk = 214,285
> A: div = 1, SPI_CLK = 53,571(!)
> B: div = 2, SPI_CLK = 35,714
> C: div = 2, SPI_CLK = 35,714
> 
> tfr->speed_hz = 50,000
> mclk = 200,000
> A: div = 1, SPI_CLK = 50,000
> B: div = 2, SPI_CLK = 33,333(!)
> C: div = 1, SPI_CLK = 50,000
> 
> tfr->speed_hz = 650,000
> mclk = 1,6000,000
> A: div = 11, SPI_CLK = 666,667(!)
> B: div = 12, SPI_CLK = 615,385
> C: div = 12, SPI_CLK = 615,385
> 
> tfr->speed_hz = 1,000,000
> mclk = 2,000,000
> A: div = 0, SPI_CLK = 1,000,000
> B: div = 1, SPI_CLK = 500,000(!)
> C: div = 0, SPI_CLK = 1,000,000
> 
> I hope that makes it a little bit easier to understand the change. Of
> course, the change only makes sense if you agree that the acutal SPI
> transfer speed should never exceed the requested speed. Which I think
> is the right approach... but maybe you think otherwise?

No, my bad for being so picky about it, while missing the
point... Thanks for your awesome explanation :)

Maxime

-- 
Maxime Ripard, Free Electrons
Embedded Linux, Kernel and Android engineering
http://free-electrons.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 819 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2016-01-26 21:25 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-26 15:53 [PATCH v6 0/3] spi: dts: sun4i: Add support for wait time between word transmissions Marcus Weseloh
2015-12-26 15:53 ` [PATCH v6 1/3] spi: dts: Add new device property to specifcy a " Marcus Weseloh
2015-12-26 15:53 ` [PATCH v6 2/3] spi: sun4i: Fix clock calculations to be predictable and never exceed the requested rate Marcus Weseloh
2015-12-27 21:09   ` Maxime Ripard
2015-12-27 23:29     ` Marcus Weseloh
2015-12-28 16:22       ` Marcus Weseloh
2016-01-10 19:44         ` Maxime Ripard
2016-01-10 21:37           ` Marcus Weseloh
2016-01-10 18:14       ` Maxime Ripard
2016-01-10 21:11         ` Marcus Weseloh
2016-01-17 18:51           ` Maxime Ripard
2016-01-18  9:40             ` Marcus Weseloh
2016-01-26 21:25               ` Maxime Ripard
2015-12-26 15:53 ` [PATCH v6 3/3] spi: sun4i: Add support for wait time between word transmissions Marcus Weseloh
2015-12-27 21:12   ` Maxime Ripard

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).