Netdev Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP
@ 2019-08-16  0:44 Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 01/11] net: dsa: sja1105: Add a debugging GPIO for monitoring SPI latency Vladimir Oltean
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

Continuing the discussion created by Hubert Feurstein around the
mv88e6xxx driver for MDIO-controlled switches
(https://lkml.org/lkml/2019/8/2/1364), this patchset takes a similar
approach for the NXP LS1021A-TSN board, which has a SPI-controlled DSA
switch (SJA1105).

The patchset is motivated by some experiments done with a logic
analyzer, trying to understand the source of latency (and especially of
the jitter). SJA1105 SPI messages for reading the PTP clock are 12 bytes
in length: 4 for the SPI header and 8 for the timestamp. When looking at
the messages with a scope, there's jitter basically everywhere: between
bits of a frame and between frames in a transfer. The inter-bit jitter
is hardware and impacts us to a lesser extend (is smaller and caused by
the PVT stability of the oscillators, PLLs, etc). We will focus on the
latency between consecutive SPI frames within a 12-byte transfer.

As a preface, revisions of the DSPI controller IP are integrated in many
Freescale/NXP devices. As a result, the driver has 3 modes of operation:
- TCFQ (Transfer Complete Flag mode): The controller signals software
  that data has been sent/received after each individual word.
- EOQ (End of Queue mode): The driver can implement batching by making
  use of the controller's 4-word deep FIFO.
- DMA (Direct Memory Access mode): The SPI controller's FIFO is no
  longer in direct interaction with the driver, but is used to trigger
  the RX and TX channels of the eDMA module on the SoC.

In LS1021A, the driver works in the least efficient mode of the 3
(TCFQ). There is a well-known errata that the DSPI controller is broken
in conjunction with the eDMA module. As for the EOQ mode, I have tried
unsuccessfully for a few days to make use of the 4 entry FIFO, and the
hardware simply fails to reliably acknowledge the transmission when the
FIFO gets full. So it looks like we're stuck with the TCFQ mode.

The problem with phc2sys on the LS1021A-TSN board is that in order for
the gettime64() call to complete on the sja1105, the system has to
service 12 IRQs. Intuitively that is excessive and is the main source of
jitter, but let's not get ahead of ourselves.

An outline of the experiments that were done (unless otherwise
mentioned, all of these ran for 120 seconds):

A. First I have measured the (poor) performance of phc2sys under current
   conditions. (DSPI driver in IRQ mode, no PTP system timestamping)

   offset: min -53310 max 16107 mean -1737.18 std dev 11444.3
   delay: min 163680 max 237360 mean 201149 std dev 22446.6
   lost servo lock 1 times

B. I switched the .gettime64 callback to .gettimex64, snapshotting the
   PTP system timestamp within the sja1105 driver.

   offset: min -48923 max 64217 mean -904.137 std dev 17358.1
   delay: min 149600 max 203840 mean 169045 std dev 17993.3
   lost servo lock 8 times

C. I patched "struct spi_transfer" to contain the PTP system timestamp,
   and from the sja1105 driver, I passed this structure to be
   snapshotted by the SPI controller's driver (spi-fsl-dspi). This is
   the "transfer-level" snapshot.

   offset: min -64979 max 38979 mean -416.197 std dev 15367.9
   delay: min 125120 max 168320 mean 150286 std dev 17675.3
   lost servo lock 10 times

D. I changed the placement of the transfer snapshotting within the DSPI
   driver, from "transfer-level" to "byte-level".

   offset: min -9021 max 7149 mean -0.418803 std dev 3529.81
   delay: min 7840 max 23920 mean 14493.7 std dev 5982.17
   lost servo lock 0 times

E. I moved the DSPI driver to poll mode. I went back to collecting the
   PTP system timestamps from the sja1105 driver (same as B).

   offset: min -4199 max 46643 mean 418.214 std dev 4554.01
   delay: min 84000 max 194000 mean 99463.2 std dev 12936.5
   lost servo lock 1 times

F. Transfer-level snapshotting in the DSPI driver (same as C), but in
   poll mode.

   offset: min -24244 max 1115 mean -230.478 std dev 2297.28
   delay: min 69440 max 119040 mean 70312.9 std dev 8065.34
   lost servo lock 1 times

G. Byte-level snapshotting (same as D) but in poll mode.

   offset: min -314 max 288 mean -2.48718 std dev 118.045
   delay: min 4880 max 6000 mean 5118.63 std dev 507.258
   lost servo lock 0 times

   This seemed suspiciously good to me, so I let it run for longer
   (58 minutes):

   offset: min -26251 max 16416 mean -21.8672 std dev 863.416
   delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
   lost servo lock 3 times

H. Transfer-level snapshotting (same as F), but with IRQs disabled.
   This ran for 86 minutes.

   offset: min -1927 max 1843 mean -0.209203 std dev 529.398
   delay: min 85440 max 93680 mean 88245 std dev 1454.71
   lost servo lock 0 times

I. Byte-level snapshotting (same as G), but with IRQs disabled.
   This ran for 102 minutes.

   offset: min -378 max 381 mean -0.0083089 std dev 101.495
   delay: min 4720 max 5920 mean 5129.38 std dev 154.899
   lost servo lock 0 times

As a result, this patchset proposes the implementation of scenario I.
The others were done through temporary patches which are not presented
here due to the difficulty of presenting a coherent git history without
resorting to reverts etc. The gist of each experiment should be clear
though.

The raw data is available for dissection at
https://drive.google.com/open?id=1r9raU9ZeqOqkqts6Lb-ISf5ubLDLP3wk.
The logic analyzer captures can be opened with a free-as-in-beer program
provided by Saleae: https://www.saleae.com/downloads/.

In the capture data one can find the MOSI, SCK SPI signals, as well as a
debug GPIO which was toggled at the same time as the PTP system
timestamp was taken, to give the viewer an impression of what the
software is capturing compared to the actual timing of the SPI transfer.

Attached are also some close-up screenshots of transfers where there is
a clear and huge delay in-between frames of the same 12-byte SPI
transfer. As it turns out, these were all caused by the CPU getting
interrupted by some other IRQ. Approaches H and I are the only ones that
get rid of these glitches. In theory, the byte-level snapshotting should
be less vulnerable to an IRQ interrupting the SPI transfer (because the
time window is much smaller) but as the 58 minutes experiment shows, it
is not immune.

Vladimir Oltean (11):
  net: dsa: sja1105: Add a debugging GPIO for monitoring SPI latency
  net: dsa: sja1105: Implement the .gettimex64 system call for PTP
  spi: Add a PTP system timestamp to the transfer structure
  spi: spi-fsl-dspi: Cosmetic cleanup
  spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
  spi: spi-fsl-dspi: Implement the PTP system timestamping
  spi: spi-fsl-dspi: Add a debugging GPIO for monitoring latency
  spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode
    transfer
  ARM: dts: ls1021a-tsn: Add debugging GPIOs for the SJA1105 and DSPI
    drivers
  ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode
  ARM: dts: ls1021a-tsn: Reduce the SJA1105 SPI frequency for debug

 arch/arm/boot/dts/ls1021a-tsn.dts      |   8 +-
 drivers/net/dsa/sja1105/sja1105.h      |   8 +-
 drivers/net/dsa/sja1105/sja1105_main.c |  15 +-
 drivers/net/dsa/sja1105/sja1105_ptp.c  |  23 +-
 drivers/net/dsa/sja1105/sja1105_spi.c  |  34 +-
 drivers/spi/spi-fsl-dspi.c             | 518 ++++++++++++++-----------
 include/linux/spi/spi.h                |   4 +
 7 files changed, 365 insertions(+), 245 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 01/11] net: dsa: sja1105: Add a debugging GPIO for monitoring SPI latency
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 02/11] net: dsa: sja1105: Implement the .gettimex64 system call for PTP Vladimir Oltean
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

I am using this to monitor the gettimex64 callback jitter, i.e. the
time it takes for the SPI controller to retrieve the PTP time from the
switch, and therefore the uncertainty in the time that has just been
read.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/net/dsa/sja1105/sja1105.h      |  4 ++++
 drivers/net/dsa/sja1105/sja1105_main.c | 11 +++++++++++
 drivers/net/dsa/sja1105/sja1105_ptp.c  |  2 ++
 3 files changed, 17 insertions(+)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index 3dbfe41b370e..e6371b2c2df1 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -89,6 +89,7 @@ struct sja1105_private {
 	bool rgmii_tx_delay[SJA1105_NUM_PORTS];
 	const struct sja1105_info *info;
 	struct gpio_desc *reset_gpio;
+	struct gpio_desc *debug_gpio;
 	struct spi_device *spidev;
 	struct dsa_switch *ds;
 	struct sja1105_port ports[SJA1105_NUM_PORTS];
@@ -126,6 +127,9 @@ typedef enum {
 	SPI_WRITE = 1,
 } sja1105_spi_rw_mode_t;
 
+/* From sja1105_main.c */
+void sja1105_debug_gpio(struct sja1105_private *priv, bool enabled);
+
 /* From sja1105_spi.c */
 int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 				sja1105_spi_rw_mode_t rw, u64 reg_addr,
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 17b917d1e6be..82bdc2da8f8f 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -23,6 +23,15 @@
 #include <linux/dsa/8021q.h>
 #include "sja1105.h"
 
+void sja1105_debug_gpio(struct sja1105_private *priv, bool enabled)
+{
+	if (IS_ERR(priv->debug_gpio)) {
+		dev_err(priv->ds->dev, "Bad debug GPIO!\n");
+		return;
+	}
+	gpiod_set_value_cansleep(priv->debug_gpio, enabled);
+}
+
 static void sja1105_hw_reset(struct gpio_desc *gpio, unsigned int pulse_len,
 			     unsigned int startup_delay)
 {
@@ -2142,6 +2151,8 @@ static int sja1105_probe(struct spi_device *spi)
 	else
 		sja1105_hw_reset(priv->reset_gpio, 1, 1);
 
+	priv->debug_gpio = devm_gpiod_get(dev, "debug", GPIOD_OUT_HIGH);
+
 	/* Populate our driver private structure (priv) based on
 	 * the device tree node that was probed (spi)
 	 */
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index 90a595cc596d..51a0014369fc 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -353,8 +353,10 @@ static u64 sja1105_ptptsclk_read(const struct cyclecounter *cc)
 	u64 ptptsclk = 0;
 	int rc;
 
+	sja1105_debug_gpio(priv, 1);
 	rc = sja1105_spi_send_int(priv, SPI_READ, regs->ptptsclk,
 				  &ptptsclk, 8);
+	sja1105_debug_gpio(priv, 0);
 	if (rc < 0)
 		dev_err_ratelimited(priv->ds->dev,
 				    "failed to read ptp cycle counter: %d\n",
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 02/11] net: dsa: sja1105: Implement the .gettimex64 system call for PTP
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 01/11] net: dsa: sja1105: Add a debugging GPIO for monitoring SPI latency Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure Vladimir Oltean
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

Through the PTP_SYS_OFFSET_EXTENDED ioctl, it is possible for userspace
applications (i.e. phc2sys) to compensate for the delays incurred while
reading the PHC's time.

Implement this ioctl by passing the PTP system timestamp to the switch's
SPI controller driver.

The 'read PTP time' message is a 12 byte structure, first 4 bytes of
which represent the SPI header, and the last 8 bytes represent the
64-bit PTP time. The switch itself starts processing the command
immediately after receiving the last bit of the address, i.e. at the
middle of byte 3 (last byte of header). The PTP time is shadowed to a
buffer register in the switch, and retrieved atomically during the
subsequent SPI frames.

As such, specify to the SPI controller that we want byte 3 to be the one
that gets software-timestamped.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/net/dsa/sja1105/sja1105.h      |  4 ++-
 drivers/net/dsa/sja1105/sja1105_main.c |  4 +--
 drivers/net/dsa/sja1105/sja1105_ptp.c  | 21 ++++++++++------
 drivers/net/dsa/sja1105/sja1105_spi.c  | 34 ++++++++++++++++++--------
 4 files changed, 42 insertions(+), 21 deletions(-)

diff --git a/drivers/net/dsa/sja1105/sja1105.h b/drivers/net/dsa/sja1105/sja1105.h
index e6371b2c2df1..18c4f2f808e4 100644
--- a/drivers/net/dsa/sja1105/sja1105.h
+++ b/drivers/net/dsa/sja1105/sja1105.h
@@ -93,6 +93,7 @@ struct sja1105_private {
 	struct spi_device *spidev;
 	struct dsa_switch *ds;
 	struct sja1105_port ports[SJA1105_NUM_PORTS];
+	struct ptp_system_timestamp *ptp_sts;
 	struct ptp_clock_info ptp_caps;
 	struct ptp_clock *clock;
 	/* The cycle counter translates the PTP timestamps (based on
@@ -136,7 +137,8 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 				void *packed_buf, size_t size_bytes);
 int sja1105_spi_send_int(const struct sja1105_private *priv,
 			 sja1105_spi_rw_mode_t rw, u64 reg_addr,
-			 u64 *value, u64 size_bytes);
+			 u64 *value, u64 size_bytes,
+			 struct ptp_system_timestamp *ptp_sts);
 int sja1105_spi_send_long_packed_buf(const struct sja1105_private *priv,
 				     sja1105_spi_rw_mode_t rw, u64 base_addr,
 				     void *packed_buf, u64 buf_len);
diff --git a/drivers/net/dsa/sja1105/sja1105_main.c b/drivers/net/dsa/sja1105/sja1105_main.c
index 82bdc2da8f8f..c4df2cef89cd 100644
--- a/drivers/net/dsa/sja1105/sja1105_main.c
+++ b/drivers/net/dsa/sja1105/sja1105_main.c
@@ -2100,8 +2100,8 @@ static int sja1105_check_device_id(struct sja1105_private *priv)
 	u64 part_no;
 	int rc;
 
-	rc = sja1105_spi_send_int(priv, SPI_READ, regs->device_id,
-				  &device_id, SJA1105_SIZE_DEVICE_ID);
+	rc = sja1105_spi_send_int(priv, SPI_READ, regs->device_id, &device_id,
+				  SJA1105_SIZE_DEVICE_ID, NULL);
 	if (rc < 0)
 		return rc;
 
diff --git a/drivers/net/dsa/sja1105/sja1105_ptp.c b/drivers/net/dsa/sja1105/sja1105_ptp.c
index 51a0014369fc..ee7d1065d745 100644
--- a/drivers/net/dsa/sja1105/sja1105_ptp.c
+++ b/drivers/net/dsa/sja1105/sja1105_ptp.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* Copyright (c) 2019, Vladimir Oltean <olteanv@gmail.com>
  */
+#include <linux/spi/spi.h>
 #include "sja1105.h"
 
 /* The adjfine API clamps ppb between [-32,768,000, 32,768,000], and
@@ -262,14 +263,17 @@ int sja1105_ptp_reset(struct sja1105_private *priv)
 	return rc;
 }
 
-static int sja1105_ptp_gettime(struct ptp_clock_info *ptp,
-			       struct timespec64 *ts)
+static int sja1105_ptp_gettimex(struct ptp_clock_info *ptp,
+				struct timespec64 *ts,
+				struct ptp_system_timestamp *sts)
 {
 	struct sja1105_private *priv = ptp_to_sja1105(ptp);
 	u64 ns;
 
 	mutex_lock(&priv->ptp_lock);
+	priv->ptp_sts = sts;
 	ns = timecounter_read(&priv->tstamp_tc);
+	priv->ptp_sts = NULL;
 	mutex_unlock(&priv->ptp_lock);
 
 	*ts = ns_to_timespec64(ns);
@@ -355,7 +359,7 @@ static u64 sja1105_ptptsclk_read(const struct cyclecounter *cc)
 
 	sja1105_debug_gpio(priv, 1);
 	rc = sja1105_spi_send_int(priv, SPI_READ, regs->ptptsclk,
-				  &ptptsclk, 8);
+				  &ptptsclk, 8, priv->ptp_sts);
 	sja1105_debug_gpio(priv, 0);
 	if (rc < 0)
 		dev_err_ratelimited(priv->ds->dev,
@@ -368,9 +372,10 @@ static void sja1105_ptp_overflow_check(struct work_struct *work)
 {
 	struct delayed_work *dw = to_delayed_work(work);
 	struct sja1105_private *priv = rw_to_sja1105(dw);
+	struct ptp_system_timestamp dummy;
 	struct timespec64 ts;
 
-	sja1105_ptp_gettime(&priv->ptp_caps, &ts);
+	sja1105_ptp_gettimex(&priv->ptp_caps, &ts, &dummy);
 
 	schedule_delayed_work(&priv->refresh_work, SJA1105_REFRESH_INTERVAL);
 }
@@ -387,7 +392,7 @@ static void sja1105_ptp_extts_work(struct work_struct *work)
 	mutex_lock(&priv->ptp_lock);
 
 	rc = sja1105_spi_send_int(priv, SPI_READ, regs->ptpsyncts,
-				  &ptpsyncts, 8);
+				  &ptpsyncts, 8, NULL);
 	if (rc < 0)
 		dev_err(priv->ds->dev, "Failed to read PTPSYNCTS: %d\n", rc);
 
@@ -433,12 +438,12 @@ static int sja1105_pps_enable(struct sja1105_private *priv, bool on)
 		ptp_pin_duration = SJA1105_HZ_TO_PIN_DURATION(1);
 
 		rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptppinst,
-					  &ptp_pin_start, 8);
+					  &ptp_pin_start, 8, NULL);
 		if (rc < 0)
 			return rc;
 
 		rc = sja1105_spi_send_int(priv, SPI_WRITE, regs->ptppindur,
-					  &ptp_pin_duration, 4);
+					  &ptp_pin_duration, 4, NULL);
 		if (rc < 0)
 			return rc;
 	}
@@ -500,7 +505,7 @@ int sja1105_ptp_clock_register(struct sja1105_private *priv)
 		.name		= "SJA1105 PHC",
 		.adjfine	= sja1105_ptp_adjfine,
 		.adjtime	= sja1105_ptp_adjtime,
-		.gettime64	= sja1105_ptp_gettime,
+		.gettimex64	= sja1105_ptp_gettimex,
 		.settime64	= sja1105_ptp_settime,
 		.enable		= sja1105_ptp_enable,
 		.max_adj	= SJA1105_MAX_ADJ_PPB,
diff --git a/drivers/net/dsa/sja1105/sja1105_spi.c b/drivers/net/dsa/sja1105/sja1105_spi.c
index e33c85569882..a3486a40e0fb 100644
--- a/drivers/net/dsa/sja1105/sja1105_spi.c
+++ b/drivers/net/dsa/sja1105/sja1105_spi.c
@@ -15,10 +15,13 @@
 	(SJA1105_SIZE_SPI_MSG_HEADER + SJA1105_SIZE_SPI_MSG_MAXLEN)
 
 static int sja1105_spi_transfer(const struct sja1105_private *priv,
-				const void *tx, void *rx, int size)
+				const void *tx, void *rx, int size,
+				struct ptp_system_timestamp *ptp_sts)
 {
 	struct spi_device *spi = priv->spidev;
 	struct spi_transfer transfer = {
+		.ptp_sts_word_offset = 3,
+		.ptp_sts = ptp_sts,
 		.tx_buf = tx,
 		.rx_buf = rx,
 		.len = size,
@@ -66,9 +69,11 @@ sja1105_spi_message_pack(void *buf, const struct sja1105_spi_message *msg)
  * @size_bytes is smaller than SIZE_SPI_MSG_MAXLEN. Larger packed buffers
  * are chunked in smaller pieces by sja1105_spi_send_long_packed_buf below.
  */
-int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
-				sja1105_spi_rw_mode_t rw, u64 reg_addr,
-				void *packed_buf, size_t size_bytes)
+static int
+__sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
+			      sja1105_spi_rw_mode_t rw, u64 reg_addr,
+			      void *packed_buf, size_t size_bytes,
+			      struct ptp_system_timestamp *ptp_sts)
 {
 	u8 tx_buf[SJA1105_SIZE_SPI_TRANSFER_MAX] = {0};
 	u8 rx_buf[SJA1105_SIZE_SPI_TRANSFER_MAX] = {0};
@@ -90,7 +95,7 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 		memcpy(tx_buf + SJA1105_SIZE_SPI_MSG_HEADER,
 		       packed_buf, size_bytes);
 
-	rc = sja1105_spi_transfer(priv, tx_buf, rx_buf, msg_len);
+	rc = sja1105_spi_transfer(priv, tx_buf, rx_buf, msg_len, ptp_sts);
 	if (rc < 0)
 		return rc;
 
@@ -101,6 +106,14 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
 	return 0;
 }
 
+int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
+				sja1105_spi_rw_mode_t rw, u64 reg_addr,
+				void *packed_buf, size_t size_bytes)
+{
+	return __sja1105_spi_send_packed_buf(priv, rw, reg_addr, packed_buf,
+					     size_bytes, NULL);
+}
+
 /* If @rw is:
  * - SPI_WRITE: creates and sends an SPI write message at absolute
  *		address reg_addr, taking size_bytes from *packed_buf
@@ -114,7 +127,8 @@ int sja1105_spi_send_packed_buf(const struct sja1105_private *priv,
  */
 int sja1105_spi_send_int(const struct sja1105_private *priv,
 			 sja1105_spi_rw_mode_t rw, u64 reg_addr,
-			 u64 *value, u64 size_bytes)
+			 u64 *value, u64 size_bytes,
+			 struct ptp_system_timestamp *ptp_sts)
 {
 	u8 packed_buf[SJA1105_SIZE_SPI_MSG_MAXLEN];
 	int rc;
@@ -126,8 +140,8 @@ int sja1105_spi_send_int(const struct sja1105_private *priv,
 		sja1105_pack(packed_buf, value, 8 * size_bytes - 1, 0,
 			     size_bytes);
 
-	rc = sja1105_spi_send_packed_buf(priv, rw, reg_addr, packed_buf,
-					 size_bytes);
+	rc = __sja1105_spi_send_packed_buf(priv, rw, reg_addr, packed_buf,
+					   size_bytes, ptp_sts);
 
 	if (rw == SPI_READ)
 		sja1105_unpack(packed_buf, value, 8 * size_bytes - 1, 0,
@@ -291,7 +305,7 @@ int sja1105_inhibit_tx(const struct sja1105_private *priv,
 	int rc;
 
 	rc = sja1105_spi_send_int(priv, SPI_READ, regs->port_control,
-				  &inhibit_cmd, SJA1105_SIZE_PORT_CTRL);
+				  &inhibit_cmd, SJA1105_SIZE_PORT_CTRL, NULL);
 	if (rc < 0)
 		return rc;
 
@@ -301,7 +315,7 @@ int sja1105_inhibit_tx(const struct sja1105_private *priv,
 		inhibit_cmd &= ~port_bitmap;
 
 	return sja1105_spi_send_int(priv, SPI_WRITE, regs->port_control,
-				    &inhibit_cmd, SJA1105_SIZE_PORT_CTRL);
+				    &inhibit_cmd, SJA1105_SIZE_PORT_CTRL, NULL);
 }
 
 struct sja1105_status {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 01/11] net: dsa: sja1105: Add a debugging GPIO for monitoring SPI latency Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 02/11] net: dsa: sja1105: Implement the .gettimex64 system call for PTP Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16 12:18   ` Mark Brown
  2019-08-16  0:44 ` [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup Vladimir Oltean
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

SPI is one of the interfaces used to access devices which have a POSIX
clock driver (real time clocks, 1588 timers etc).

Since there are lots of sources of timing jitter when retrieving the
time from such a device (controller delays, locking contention etc),
introduce a PTP system timestamp structure in struct spi_transfer. This
is to be used by SPI device drivers when they need to know the exact
time at which the underlying device's time was snapshotted.

Because SPI controllers may have jitter even between frames, also
introduce a field which specifies to the controller driver specifically
which byte needs to be snapshotted.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 include/linux/spi/spi.h | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
index af4f265d0f67..5a1e4b24c617 100644
--- a/include/linux/spi/spi.h
+++ b/include/linux/spi/spi.h
@@ -13,6 +13,7 @@
 #include <linux/completion.h>
 #include <linux/scatterlist.h>
 #include <linux/gpio/consumer.h>
+#include <linux/ptp_clock_kernel.h>
 
 struct dma_chan;
 struct property_entry;
@@ -842,6 +843,9 @@ struct spi_transfer {
 
 	u32		effective_speed_hz;
 
+	struct ptp_system_timestamp *ptp_sts;
+	unsigned int	ptp_sts_word_offset;
+
 	struct list_head transfer_list;
 };
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (2 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16 12:21   ` Mark Brown
  2019-08-16  0:44 ` [RFC PATCH net-next 05/11] spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing Vladimir Oltean
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

This patch addresses some cosmetic issues:
- Alignment
- Typos
- (Non-)use of BIT() and GENMASK() macros
- Unused definitions
- Unused includes
- Abuse of ternary operator in detriment of readability
- Reduce indentation level

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 312 ++++++++++++++++++-------------------
 1 file changed, 154 insertions(+), 158 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 53335ccc98f6..99708b36ee4f 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -9,26 +9,16 @@
 #include <linux/delay.h>
 #include <linux/dmaengine.h>
 #include <linux/dma-mapping.h>
-#include <linux/err.h>
-#include <linux/errno.h>
 #include <linux/interrupt.h>
-#include <linux/io.h>
 #include <linux/kernel.h>
-#include <linux/math64.h>
 #include <linux/module.h>
-#include <linux/of.h>
 #include <linux/of_device.h>
 #include <linux/pinctrl/consumer.h>
-#include <linux/platform_device.h>
-#include <linux/pm_runtime.h>
 #include <linux/regmap.h>
-#include <linux/sched.h>
 #include <linux/spi/spi.h>
 #include <linux/spi/spi-fsl-dspi.h>
-#include <linux/spi/spi_bitbang.h>
-#include <linux/time.h>
 
-#define DRIVER_NAME "fsl-dspi"
+#define DRIVER_NAME			"fsl-dspi"
 
 #ifdef CONFIG_M5441x
 #define DSPI_FIFO_SIZE			16
@@ -37,97 +27,98 @@
 #endif
 #define DSPI_DMA_BUFSIZE		(DSPI_FIFO_SIZE * 1024)
 
-#define SPI_MCR		0x00
-#define SPI_MCR_MASTER		(1 << 31)
-#define SPI_MCR_PCSIS		(0x3F << 16)
-#define SPI_MCR_CLR_TXF	(1 << 11)
-#define SPI_MCR_CLR_RXF	(1 << 10)
-#define SPI_MCR_XSPI		(1 << 3)
-
-#define SPI_TCR			0x08
-#define SPI_TCR_GET_TCNT(x)	(((x) & 0xffff0000) >> 16)
-
-#define SPI_CTAR(x)		(0x0c + (((x) & 0x3) * 4))
-#define SPI_CTAR_FMSZ(x)	(((x) & 0x0000000f) << 27)
-#define SPI_CTAR_CPOL(x)	((x) << 26)
-#define SPI_CTAR_CPHA(x)	((x) << 25)
-#define SPI_CTAR_LSBFE(x)	((x) << 24)
-#define SPI_CTAR_PCSSCK(x)	(((x) & 0x00000003) << 22)
-#define SPI_CTAR_PASC(x)	(((x) & 0x00000003) << 20)
-#define SPI_CTAR_PDT(x)	(((x) & 0x00000003) << 18)
-#define SPI_CTAR_PBR(x)	(((x) & 0x00000003) << 16)
-#define SPI_CTAR_CSSCK(x)	(((x) & 0x0000000f) << 12)
-#define SPI_CTAR_ASC(x)	(((x) & 0x0000000f) << 8)
-#define SPI_CTAR_DT(x)		(((x) & 0x0000000f) << 4)
-#define SPI_CTAR_BR(x)		((x) & 0x0000000f)
-#define SPI_CTAR_SCALE_BITS	0xf
-
-#define SPI_CTAR0_SLAVE	0x0c
-
-#define SPI_SR			0x2c
-#define SPI_SR_EOQF		0x10000000
-#define SPI_SR_TCFQF		0x80000000
-#define SPI_SR_CLEAR		0x9aaf0000
-
-#define SPI_RSER_TFFFE		BIT(25)
-#define SPI_RSER_TFFFD		BIT(24)
-#define SPI_RSER_RFDFE		BIT(17)
-#define SPI_RSER_RFDFD		BIT(16)
-
-#define SPI_RSER		0x30
-#define SPI_RSER_EOQFE		0x10000000
-#define SPI_RSER_TCFQE		0x80000000
-
-#define SPI_PUSHR		0x34
-#define SPI_PUSHR_CMD_CONT	(1 << 15)
-#define SPI_PUSHR_CONT		(SPI_PUSHR_CMD_CONT << 16)
-#define SPI_PUSHR_CMD_CTAS(x)	(((x) & 0x0003) << 12)
-#define SPI_PUSHR_CTAS(x)	(SPI_PUSHR_CMD_CTAS(x) << 16)
-#define SPI_PUSHR_CMD_EOQ	(1 << 11)
-#define SPI_PUSHR_EOQ		(SPI_PUSHR_CMD_EOQ << 16)
-#define SPI_PUSHR_CMD_CTCNT	(1 << 10)
-#define SPI_PUSHR_CTCNT		(SPI_PUSHR_CMD_CTCNT << 16)
-#define SPI_PUSHR_CMD_PCS(x)	((1 << x) & 0x003f)
-#define SPI_PUSHR_PCS(x)	(SPI_PUSHR_CMD_PCS(x) << 16)
-#define SPI_PUSHR_TXDATA(x)	((x) & 0x0000ffff)
-
-#define SPI_PUSHR_SLAVE	0x34
-
-#define SPI_POPR		0x38
-#define SPI_POPR_RXDATA(x)	((x) & 0x0000ffff)
-
-#define SPI_TXFR0		0x3c
-#define SPI_TXFR1		0x40
-#define SPI_TXFR2		0x44
-#define SPI_TXFR3		0x48
-#define SPI_RXFR0		0x7c
-#define SPI_RXFR1		0x80
-#define SPI_RXFR2		0x84
-#define SPI_RXFR3		0x88
-
-#define SPI_CTARE(x)		(0x11c + (((x) & 0x3) * 4))
-#define SPI_CTARE_FMSZE(x)	(((x) & 0x1) << 16)
-#define SPI_CTARE_DTCP(x)	((x) & 0x7ff)
-
-#define SPI_SREX		0x13c
-
-#define SPI_FRAME_BITS(bits)	SPI_CTAR_FMSZ((bits) - 1)
-#define SPI_FRAME_BITS_MASK	SPI_CTAR_FMSZ(0xf)
-#define SPI_FRAME_BITS_16	SPI_CTAR_FMSZ(0xf)
-#define SPI_FRAME_BITS_8	SPI_CTAR_FMSZ(0x7)
-
-#define SPI_FRAME_EBITS(bits)	SPI_CTARE_FMSZE(((bits) - 1) >> 4)
-#define SPI_FRAME_EBITS_MASK	SPI_CTARE_FMSZE(1)
+#define SPI_MCR				0x00
+#define SPI_MCR_MASTER			BIT(31)
+#define SPI_MCR_PCSIS			(0x3F << 16)
+#define SPI_MCR_CLR_TXF			BIT(11)
+#define SPI_MCR_CLR_RXF			BIT(10)
+#define SPI_MCR_XSPI			BIT(3)
+
+#define SPI_TCR				0x08
+#define SPI_TCR_GET_TCNT(x)		(((x) & GENMASK(31, 16)) >> 16)
+
+#define SPI_CTAR(x)			(0x0c + (((x) & GENMASK(1, 0)) * 4))
+#define SPI_CTAR_FMSZ(x)		(((x) << 27) & GENMASK(30, 27))
+#define SPI_CTAR_CPOL			BIT(26)
+#define SPI_CTAR_CPHA			BIT(25)
+#define SPI_CTAR_LSBFE			BIT(24)
+#define SPI_CTAR_PCSSCK(x)		(((x) << 22) & GENMASK(23, 22))
+#define SPI_CTAR_PASC(x)		(((x) << 20) & GENMASK(21, 20))
+#define SPI_CTAR_PDT(x)			(((x) << 18) & GENMASK(19, 18))
+#define SPI_CTAR_PBR(x)			(((x) << 16) & GENMASK(17, 16))
+#define SPI_CTAR_CSSCK(x)		(((x) << 12) & GENMASK(15, 12))
+#define SPI_CTAR_ASC(x)			(((x) << 8) & GENMASK(11, 8))
+#define SPI_CTAR_DT(x)			(((x) << 4) & GENMASK(7, 4))
+#define SPI_CTAR_BR(x)			((x) & GENMASK(3, 0))
+#define SPI_CTAR_SCALE_BITS		0xf
+
+#define SPI_CTAR0_SLAVE			0x0c
+
+#define SPI_SR				0x2c
+#define SPI_SR_TCFQF			BIT(31)
+#define SPI_SR_EOQF			BIT(28)
+#define SPI_SR_TFUF			BIT(27)
+#define SPI_SR_TFFF			BIT(25)
+#define SPI_SR_CMDTCF			BIT(23)
+#define SPI_SR_SPEF			BIT(21)
+#define SPI_SR_RFOF			BIT(19)
+#define SPI_SR_TFIWF			BIT(18)
+#define SPI_SR_RFDF			BIT(17)
+#define SPI_SR_CMDFFF			BIT(16)
+#define SPI_SR_CLEAR			(SPI_SR_TCFQF | SPI_SR_EOQF | \
+					SPI_SR_TFUF | SPI_SR_TFFF | \
+					SPI_SR_CMDTCF | SPI_SR_SPEF | \
+					SPI_SR_RFOF | SPI_SR_TFIWF | \
+					SPI_SR_RFDF | SPI_SR_CMDFFF)
+
+#define SPI_RSER_TFFFE			BIT(25)
+#define SPI_RSER_TFFFD			BIT(24)
+#define SPI_RSER_RFDFE			BIT(17)
+#define SPI_RSER_RFDFD			BIT(16)
+
+#define SPI_RSER			0x30
+#define SPI_RSER_TCFQE			BIT(31)
+#define SPI_RSER_EOQFE			BIT(28)
+
+#define SPI_PUSHR			0x34
+#define SPI_PUSHR_CMD_CONT		BIT(15)
+#define SPI_PUSHR_CMD_CTAS(x)		(((x) << 12 & GENMASK(14, 12)))
+#define SPI_PUSHR_CMD_EOQ		BIT(11)
+#define SPI_PUSHR_CMD_CTCNT		BIT(10)
+#define SPI_PUSHR_CMD_PCS(x)		(BIT(x) & GENMASK(5, 0))
+
+#define SPI_PUSHR_SLAVE			0x34
+
+#define SPI_POPR			0x38
+#define SPI_POPR_RXDATA(x)		((x) & GENMASK(15, 0))
+
+#define SPI_TXFR0			0x3c
+#define SPI_TXFR1			0x40
+#define SPI_TXFR2			0x44
+#define SPI_TXFR3			0x48
+#define SPI_RXFR0			0x7c
+#define SPI_RXFR1			0x80
+#define SPI_RXFR2			0x84
+#define SPI_RXFR3			0x88
+
+#define SPI_CTARE(x)			(0x11c + (((x) & GENMASK(1, 0)) * 4))
+#define SPI_CTARE_FMSZE(x)		(((x) & 0x1) << 16)
+#define SPI_CTARE_DTCP(x)		((x) & 0x7ff)
+
+#define SPI_SREX			0x13c
+
+#define SPI_FRAME_BITS(bits)		SPI_CTAR_FMSZ((bits) - 1)
+#define SPI_FRAME_EBITS(bits)		SPI_CTARE_FMSZE(((bits) - 1) >> 4)
 
 /* Register offsets for regmap_pushr */
-#define PUSHR_CMD		0x0
-#define PUSHR_TX		0x2
+#define PUSHR_CMD			0x0
+#define PUSHR_TX			0x2
 
-#define SPI_CS_INIT		0x01
-#define SPI_CS_ASSERT		0x02
-#define SPI_CS_DROP		0x04
+#define SPI_CS_INIT			0x01
+#define SPI_CS_ASSERT			0x02
+#define SPI_CS_DROP			0x04
 
-#define DMA_COMPLETION_TIMEOUT	msecs_to_jiffies(3000)
+#define DMA_COMPLETION_TIMEOUT		msecs_to_jiffies(3000)
 
 struct chip_data {
 	u32 ctar_val;
@@ -246,7 +237,7 @@ static void dspi_push_rx(struct fsl_dspi *dspi, u32 rxdata)
 	if (!dspi->rx)
 		return;
 
-	/* Mask of undefined bits */
+	/* Mask off undefined bits */
 	rxdata &= (1 << dspi->bits_per_word) - 1;
 
 	if (dspi->bytes_per_word == 1)
@@ -282,8 +273,8 @@ static void dspi_rx_dma_callback(void *arg)
 
 static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 {
-	struct fsl_dspi_dma *dma = dspi->dma;
 	struct device *dev = &dspi->pdev->dev;
+	struct fsl_dspi_dma *dma = dspi->dma;
 	int time_left;
 	int i;
 
@@ -360,9 +351,9 @@ static int dspi_next_xfer_dma_submit(struct fsl_dspi *dspi)
 
 static int dspi_dma_xfer(struct fsl_dspi *dspi)
 {
-	struct fsl_dspi_dma *dma = dspi->dma;
-	struct device *dev = &dspi->pdev->dev;
 	struct spi_message *message = dspi->cur_msg;
+	struct device *dev = &dspi->pdev->dev;
+	struct fsl_dspi_dma *dma = dspi->dma;
 	int curr_remaining_bytes;
 	int bytes_per_buffer;
 	int ret = 0;
@@ -397,9 +388,9 @@ static int dspi_dma_xfer(struct fsl_dspi *dspi)
 
 static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
 {
-	struct fsl_dspi_dma *dma;
-	struct dma_slave_config cfg;
 	struct device *dev = &dspi->pdev->dev;
+	struct dma_slave_config cfg;
+	struct fsl_dspi_dma *dma;
 	int ret;
 
 	dma = devm_kzalloc(dev, sizeof(*dma), GFP_KERNEL);
@@ -421,14 +412,14 @@ static int dspi_request_dma(struct fsl_dspi *dspi, phys_addr_t phy_addr)
 	}
 
 	dma->tx_dma_buf = dma_alloc_coherent(dev, DSPI_DMA_BUFSIZE,
-					&dma->tx_dma_phys, GFP_KERNEL);
+					     &dma->tx_dma_phys, GFP_KERNEL);
 	if (!dma->tx_dma_buf) {
 		ret = -ENOMEM;
 		goto err_tx_dma_buf;
 	}
 
 	dma->rx_dma_buf = dma_alloc_coherent(dev, DSPI_DMA_BUFSIZE,
-					&dma->rx_dma_phys, GFP_KERNEL);
+					     &dma->rx_dma_phys, GFP_KERNEL);
 	if (!dma->rx_dma_buf) {
 		ret = -ENOMEM;
 		goto err_rx_dma_buf;
@@ -485,30 +476,31 @@ static void dspi_release_dma(struct fsl_dspi *dspi)
 	struct fsl_dspi_dma *dma = dspi->dma;
 	struct device *dev = &dspi->pdev->dev;
 
-	if (dma) {
-		if (dma->chan_tx) {
-			dma_unmap_single(dev, dma->tx_dma_phys,
-					DSPI_DMA_BUFSIZE, DMA_TO_DEVICE);
-			dma_release_channel(dma->chan_tx);
-		}
+	if (!dma)
+		return;
 
-		if (dma->chan_rx) {
-			dma_unmap_single(dev, dma->rx_dma_phys,
-					DSPI_DMA_BUFSIZE, DMA_FROM_DEVICE);
-			dma_release_channel(dma->chan_rx);
-		}
+	if (dma->chan_tx) {
+		dma_unmap_single(dev, dma->tx_dma_phys,
+				 DSPI_DMA_BUFSIZE, DMA_TO_DEVICE);
+		dma_release_channel(dma->chan_tx);
+	}
+
+	if (dma->chan_rx) {
+		dma_unmap_single(dev, dma->rx_dma_phys,
+				 DSPI_DMA_BUFSIZE, DMA_FROM_DEVICE);
+		dma_release_channel(dma->chan_rx);
 	}
 }
 
 static void hz_to_spi_baud(char *pbr, char *br, int speed_hz,
-		unsigned long clkrate)
+			   unsigned long clkrate)
 {
 	/* Valid baud rate pre-scaler values */
 	int pbr_tbl[4] = {2, 3, 5, 7};
 	int brs[16] = {	2,	4,	6,	8,
-		16,	32,	64,	128,
-		256,	512,	1024,	2048,
-		4096,	8192,	16384,	32768 };
+			16,	32,	64,	128,
+			256,	512,	1024,	2048,
+			4096,	8192,	16384,	32768 };
 	int scale_needed, scale, minscale = INT_MAX;
 	int i, j;
 
@@ -538,15 +530,15 @@ static void hz_to_spi_baud(char *pbr, char *br, int speed_hz,
 }
 
 static void ns_delay_scale(char *psc, char *sc, int delay_ns,
-		unsigned long clkrate)
+			   unsigned long clkrate)
 {
-	int pscale_tbl[4] = {1, 3, 5, 7};
 	int scale_needed, scale, minscale = INT_MAX;
-	int i, j;
+	int pscale_tbl[4] = {1, 3, 5, 7};
 	u32 remainder;
+	int i, j;
 
 	scale_needed = div_u64_rem((u64)delay_ns * clkrate, NSEC_PER_SEC,
-			&remainder);
+				   &remainder);
 	if (remainder)
 		scale_needed++;
 
@@ -601,7 +593,7 @@ static void dspi_tcfq_write(struct fsl_dspi *dspi)
 		 */
 		u32 data = dspi_pop_tx(dspi);
 
-		if (dspi->cur_chip->ctar_val & SPI_CTAR_LSBFE(1)) {
+		if (dspi->cur_chip->ctar_val & SPI_CTAR_LSBFE) {
 			/* LSB */
 			tx_fifo_write(dspi, data & 0xFFFF);
 			tx_fifo_write(dspi, data >> 16);
@@ -655,19 +647,19 @@ static void dspi_eoq_read(struct fsl_dspi *dspi)
 {
 	int fifo_size = DSPI_FIFO_SIZE;
 
-	/* Read one FIFO entry at and push to rx buffer */
+	/* Read one FIFO entry and push to rx buffer */
 	while ((dspi->rx < dspi->rx_end) && fifo_size--)
 		dspi_push_rx(dspi, fifo_read(dspi));
 }
 
 static int dspi_transfer_one_message(struct spi_master *master,
-		struct spi_message *message)
+				     struct spi_message *message)
 {
 	struct fsl_dspi *dspi = spi_master_get_devdata(master);
 	struct spi_device *spi = message->spi;
+	enum dspi_trans_mode trans_mode;
 	struct spi_transfer *transfer;
 	int status = 0;
-	enum dspi_trans_mode trans_mode;
 
 	message->actual_length = 0;
 
@@ -677,7 +669,7 @@ static int dspi_transfer_one_message(struct spi_master *master,
 		dspi->cur_chip = spi_get_ctldata(spi);
 		/* Prepare command word for CMD FIFO */
 		dspi->tx_cmd = SPI_PUSHR_CMD_CTAS(0) |
-			SPI_PUSHR_CMD_PCS(spi->chip_select);
+			       SPI_PUSHR_CMD_PCS(spi->chip_select);
 		if (list_is_last(&dspi->cur_transfer->transfer_list,
 				 &dspi->cur_msg->transfers)) {
 			/* Leave PCS activated after last transfer when
@@ -718,8 +710,8 @@ static int dspi_transfer_one_message(struct spi_master *master,
 			     SPI_FRAME_BITS(transfer->bits_per_word));
 		if (dspi->devtype_data->xspi_mode)
 			regmap_write(dspi->regmap, SPI_CTARE(0),
-				     SPI_FRAME_EBITS(transfer->bits_per_word)
-				     | SPI_CTARE_DTCP(1));
+				     SPI_FRAME_EBITS(transfer->bits_per_word) |
+				     SPI_CTARE_DTCP(1));
 
 		trans_mode = dspi->devtype_data->trans_mode;
 		switch (trans_mode) {
@@ -733,8 +725,8 @@ static int dspi_transfer_one_message(struct spi_master *master,
 			break;
 		case DSPI_DMA_MODE:
 			regmap_write(dspi->regmap, SPI_RSER,
-				SPI_RSER_TFFFE | SPI_RSER_TFFFD |
-				SPI_RSER_RFDFE | SPI_RSER_RFDFD);
+				     SPI_RSER_TFFFE | SPI_RSER_TFFFD |
+				     SPI_RSER_RFDFE | SPI_RSER_RFDFD);
 			status = dspi_dma_xfer(dspi);
 			break;
 		default:
@@ -746,7 +738,7 @@ static int dspi_transfer_one_message(struct spi_master *master,
 
 		if (trans_mode != DSPI_DMA_MODE) {
 			if (wait_event_interruptible(dspi->waitq,
-						dspi->waitflags))
+						     dspi->waitflags))
 				dev_err(&dspi->pdev->dev,
 					"wait transfer complete fail!\n");
 			dspi->waitflags = 0;
@@ -765,12 +757,12 @@ static int dspi_transfer_one_message(struct spi_master *master,
 
 static int dspi_setup(struct spi_device *spi)
 {
-	struct chip_data *chip;
 	struct fsl_dspi *dspi = spi_master_get_devdata(spi->master);
-	struct fsl_dspi_platform_data *pdata;
-	u32 cs_sck_delay = 0, sck_cs_delay = 0;
 	unsigned char br = 0, pbr = 0, pcssck = 0, cssck = 0;
+	u32 cs_sck_delay = 0, sck_cs_delay = 0;
+	struct fsl_dspi_platform_data *pdata;
 	unsigned char pasc = 0, asc = 0;
+	struct chip_data *chip;
 	unsigned long clkrate;
 
 	/* Only alloc on first setup */
@@ -805,18 +797,22 @@ static int dspi_setup(struct spi_device *spi)
 	/* Set After SCK delay scale values */
 	ns_delay_scale(&pasc, &asc, sck_cs_delay, clkrate);
 
-	chip->ctar_val = SPI_CTAR_CPOL(spi->mode & SPI_CPOL ? 1 : 0)
-		| SPI_CTAR_CPHA(spi->mode & SPI_CPHA ? 1 : 0);
+	chip->ctar_val = 0;
+	if (spi->mode & SPI_CPOL)
+		chip->ctar_val |= SPI_CTAR_CPOL;
+	if (spi->mode & SPI_CPHA)
+		chip->ctar_val |= SPI_CTAR_CPHA;
 
 	if (!spi_controller_is_slave(dspi->master)) {
-		chip->ctar_val |= SPI_CTAR_LSBFE(spi->mode &
-						 SPI_LSB_FIRST ? 1 : 0)
-			| SPI_CTAR_PCSSCK(pcssck)
-			| SPI_CTAR_CSSCK(cssck)
-			| SPI_CTAR_PASC(pasc)
-			| SPI_CTAR_ASC(asc)
-			| SPI_CTAR_PBR(pbr)
-			| SPI_CTAR_BR(br);
+		chip->ctar_val |= SPI_CTAR_PCSSCK(pcssck) |
+				  SPI_CTAR_CSSCK(cssck) |
+				  SPI_CTAR_PASC(pasc) |
+				  SPI_CTAR_ASC(asc) |
+				  SPI_CTAR_PBR(pbr) |
+				  SPI_CTAR_BR(br);
+
+		if (spi->mode & SPI_LSB_FIRST)
+			chip->ctar_val |= SPI_CTAR_LSBFE;
 	}
 
 	spi_set_ctldata(spi, chip);
@@ -829,7 +825,7 @@ static void dspi_cleanup(struct spi_device *spi)
 	struct chip_data *chip = spi_get_ctldata((struct spi_device *)spi);
 
 	dev_dbg(&spi->dev, "spi_device %u.%u cleanup\n",
-			spi->master->bus_num, spi->chip_select);
+		spi->master->bus_num, spi->chip_select);
 
 	kfree(chip);
 }
@@ -845,7 +841,6 @@ static irqreturn_t dspi_interrupt(int irq, void *dev_id)
 	regmap_read(dspi->regmap, SPI_SR, &spi_sr);
 	regmap_write(dspi->regmap, SPI_SR, spi_sr);
 
-
 	if (spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)) {
 		/* Get transfer counter (in number of SPI transfers). It was
 		 * reset to 0 when transfer(s) were started.
@@ -982,9 +977,10 @@ static const struct regmap_config dspi_xspi_regmap_config[] = {
 
 static void dspi_init(struct fsl_dspi *dspi)
 {
-	unsigned int mcr = SPI_MCR_PCSIS |
-		(dspi->devtype_data->xspi_mode ? SPI_MCR_XSPI : 0);
+	unsigned int mcr = SPI_MCR_PCSIS;
 
+	if (dspi->devtype_data->xspi_mode)
+		mcr |= SPI_MCR_XSPI;
 	if (!spi_controller_is_slave(dspi->master))
 		mcr |= SPI_MCR_MASTER;
 
@@ -1161,12 +1157,12 @@ static int dspi_remove(struct platform_device *pdev)
 }
 
 static struct platform_driver fsl_dspi_driver = {
-	.driver.name    = DRIVER_NAME,
-	.driver.of_match_table = fsl_dspi_dt_ids,
-	.driver.owner   = THIS_MODULE,
-	.driver.pm = &dspi_pm,
-	.probe          = dspi_probe,
-	.remove		= dspi_remove,
+	.driver.name		= DRIVER_NAME,
+	.driver.of_match_table	= fsl_dspi_dt_ids,
+	.driver.owner		= THIS_MODULE,
+	.driver.pm		= &dspi_pm,
+	.probe			= dspi_probe,
+	.remove			= dspi_remove,
 };
 module_platform_driver(fsl_dspi_driver);
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 05/11] spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (3 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 06/11] spi: spi-fsl-dspi: Implement the PTP system timestamping Vladimir Oltean
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

On platforms like LS1021A which use TCFQ mode, an interrupt needs to be
processed after each byte is TXed/RXed. I tried to make the DSPI
implementation on this SoC operate in other, more efficient modes (EOQ,
DMA) but it looks like it simply isn't possible.

Therefore allow the driver to operate in poll mode, to ease a bit of
this absurd amount of IRQ load generated in TCFQ mode. Doing so reduces
both the net time it takes to transmit a SPI message, as well as the
inter-frame jitter that occurs while doing so.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 156 +++++++++++++++++++++----------------
 1 file changed, 90 insertions(+), 66 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 99708b36ee4f..41c45ee2bb2d 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -652,6 +652,77 @@ static void dspi_eoq_read(struct fsl_dspi *dspi)
 		dspi_push_rx(dspi, fifo_read(dspi));
 }
 
+static int dspi_rxtx(struct fsl_dspi *dspi)
+{
+	struct spi_message *msg = dspi->cur_msg;
+	u16 spi_tcnt;
+	u32 spi_tcr;
+
+	/* Get transfer counter (in number of SPI transfers). It was
+	 * reset to 0 when transfer(s) were started.
+	 */
+	regmap_read(dspi->regmap, SPI_TCR, &spi_tcr);
+	spi_tcnt = SPI_TCR_GET_TCNT(spi_tcr);
+	/* Update total number of bytes that were transferred */
+	msg->actual_length += spi_tcnt * dspi->bytes_per_word;
+
+	if (dspi->devtype_data->trans_mode == DSPI_EOQ_MODE)
+		dspi_eoq_read(dspi);
+	else
+		dspi_tcfq_read(dspi);
+
+	if (!dspi->len)
+		/* Success! */
+		return 0;
+
+	if (dspi->devtype_data->trans_mode == DSPI_EOQ_MODE)
+		dspi_eoq_write(dspi);
+	else
+		dspi_tcfq_write(dspi);
+
+	return -EAGAIN;
+}
+
+static int dspi_poll(struct fsl_dspi *dspi)
+{
+	int tries = 1000;
+	u32 spi_sr;
+
+	do {
+		regmap_read(dspi->regmap, SPI_SR, &spi_sr);
+		regmap_write(dspi->regmap, SPI_SR, spi_sr);
+
+		if (spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF))
+			break;
+	} while (--tries);
+
+	if (!tries)
+		return -ETIMEDOUT;
+
+	return dspi_rxtx(dspi);
+}
+
+static irqreturn_t dspi_interrupt(int irq, void *dev_id)
+{
+	struct fsl_dspi *dspi = (struct fsl_dspi *)dev_id;
+	u32 spi_sr;
+
+	regmap_read(dspi->regmap, SPI_SR, &spi_sr);
+	regmap_write(dspi->regmap, SPI_SR, spi_sr);
+
+	if (!(spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)))
+		return IRQ_HANDLED;
+
+	dspi_rxtx(dspi);
+
+	if (!dspi->len) {
+		dspi->waitflags = 1;
+		wake_up_interruptible(&dspi->waitq);
+	}
+
+	return IRQ_HANDLED;
+}
+
 static int dspi_transfer_one_message(struct spi_master *master,
 				     struct spi_message *message)
 {
@@ -736,13 +807,18 @@ static int dspi_transfer_one_message(struct spi_master *master,
 			goto out;
 		}
 
-		if (trans_mode != DSPI_DMA_MODE) {
-			if (wait_event_interruptible(dspi->waitq,
-						     dspi->waitflags))
-				dev_err(&dspi->pdev->dev,
-					"wait transfer complete fail!\n");
+		if (!dspi->irq) {
+			do {
+				status = dspi_poll(dspi);
+			} while (status == -EAGAIN);
+		} else if (trans_mode != DSPI_DMA_MODE) {
+			status = wait_event_interruptible(dspi->waitq,
+							  dspi->waitflags);
 			dspi->waitflags = 0;
 		}
+		if (status)
+			dev_err(&dspi->pdev->dev,
+				"Waiting for transfer to complete failed!\n");
 
 		if (transfer->delay_usecs)
 			udelay(transfer->delay_usecs);
@@ -830,62 +906,6 @@ static void dspi_cleanup(struct spi_device *spi)
 	kfree(chip);
 }
 
-static irqreturn_t dspi_interrupt(int irq, void *dev_id)
-{
-	struct fsl_dspi *dspi = (struct fsl_dspi *)dev_id;
-	struct spi_message *msg = dspi->cur_msg;
-	enum dspi_trans_mode trans_mode;
-	u32 spi_sr, spi_tcr;
-	u16 spi_tcnt;
-
-	regmap_read(dspi->regmap, SPI_SR, &spi_sr);
-	regmap_write(dspi->regmap, SPI_SR, spi_sr);
-
-	if (spi_sr & (SPI_SR_EOQF | SPI_SR_TCFQF)) {
-		/* Get transfer counter (in number of SPI transfers). It was
-		 * reset to 0 when transfer(s) were started.
-		 */
-		regmap_read(dspi->regmap, SPI_TCR, &spi_tcr);
-		spi_tcnt = SPI_TCR_GET_TCNT(spi_tcr);
-		/* Update total number of bytes that were transferred */
-		msg->actual_length += spi_tcnt * dspi->bytes_per_word;
-
-		trans_mode = dspi->devtype_data->trans_mode;
-		switch (trans_mode) {
-		case DSPI_EOQ_MODE:
-			dspi_eoq_read(dspi);
-			break;
-		case DSPI_TCFQ_MODE:
-			dspi_tcfq_read(dspi);
-			break;
-		default:
-			dev_err(&dspi->pdev->dev, "unsupported trans_mode %u\n",
-				trans_mode);
-				return IRQ_HANDLED;
-		}
-
-		if (!dspi->len) {
-			dspi->waitflags = 1;
-			wake_up_interruptible(&dspi->waitq);
-		} else {
-			switch (trans_mode) {
-			case DSPI_EOQ_MODE:
-				dspi_eoq_write(dspi);
-				break;
-			case DSPI_TCFQ_MODE:
-				dspi_tcfq_write(dspi);
-				break;
-			default:
-				dev_err(&dspi->pdev->dev,
-					"unsupported trans_mode %u\n",
-					trans_mode);
-			}
-		}
-	}
-
-	return IRQ_HANDLED;
-}
-
 static const struct of_device_id fsl_dspi_dt_ids[] = {
 	{ .compatible = "fsl,vf610-dspi", .data = &vf610_data, },
 	{ .compatible = "fsl,ls1021a-v1.0-dspi", .data = &ls1021a_v1_data, },
@@ -1099,11 +1119,13 @@ static int dspi_probe(struct platform_device *pdev)
 		goto out_master_put;
 
 	dspi_init(dspi);
+
 	dspi->irq = platform_get_irq(pdev, 0);
-	if (dspi->irq < 0) {
-		dev_err(&pdev->dev, "can't get platform irq\n");
-		ret = dspi->irq;
-		goto out_clk_put;
+	if (dspi->irq <= 0) {
+		dev_info(&pdev->dev,
+			 "can't get platform irq, using poll mode\n");
+		dspi->irq = 0;
+		goto poll_mode;
 	}
 
 	ret = devm_request_irq(&pdev->dev, dspi->irq, dspi_interrupt,
@@ -1113,6 +1135,9 @@ static int dspi_probe(struct platform_device *pdev)
 		goto out_clk_put;
 	}
 
+	init_waitqueue_head(&dspi->waitq);
+
+poll_mode:
 	if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
 		ret = dspi_request_dma(dspi, res->start);
 		if (ret < 0) {
@@ -1124,7 +1149,6 @@ static int dspi_probe(struct platform_device *pdev)
 	master->max_speed_hz =
 		clk_get_rate(dspi->clk) / dspi->devtype_data->max_clock_factor;
 
-	init_waitqueue_head(&dspi->waitq);
 	platform_set_drvdata(pdev, master);
 
 	ret = spi_register_master(master);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 06/11] spi: spi-fsl-dspi: Implement the PTP system timestamping
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (4 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 05/11] spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 07/11] spi: spi-fsl-dspi: Add a debugging GPIO for monitoring latency Vladimir Oltean
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

This adds a snapshotting software feature for TCFQ and EOQ modes of
operation. Due to my lack of proper understanding of the DMA mode,
the latter mode is left as an exercise for future developers.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 41c45ee2bb2d..3fc266d8263a 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -184,6 +184,9 @@ struct fsl_dspi {
 	int			irq;
 	struct clk		*clk;
 
+	struct ptp_system_timestamp *ptp_sts;
+	const void		*ptp_sts_word;
+	bool			take_snapshot;
 	struct spi_transfer	*cur_transfer;
 	struct spi_message	*cur_msg;
 	struct chip_data	*cur_chip;
@@ -658,6 +661,9 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 	u16 spi_tcnt;
 	u32 spi_tcr;
 
+	if (dspi->take_snapshot)
+		ptp_read_system_postts(dspi->ptp_sts);
+
 	/* Get transfer counter (in number of SPI transfers). It was
 	 * reset to 0 when transfer(s) were started.
 	 */
@@ -675,6 +681,11 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 		/* Success! */
 		return 0;
 
+	dspi->take_snapshot = (dspi->tx == dspi->ptp_sts_word);
+
+	if (dspi->take_snapshot)
+		ptp_read_system_prets(dspi->ptp_sts);
+
 	if (dspi->devtype_data->trans_mode == DSPI_EOQ_MODE)
 		dspi_eoq_write(dspi);
 	else
@@ -764,6 +775,8 @@ static int dspi_transfer_one_message(struct spi_master *master,
 		dspi->rx = transfer->rx_buf;
 		dspi->rx_end = dspi->rx + transfer->len;
 		dspi->len = transfer->len;
+		dspi->ptp_sts = transfer->ptp_sts;
+		dspi->ptp_sts_word = dspi->tx + transfer->ptp_sts_word_offset;
 		/* Validated transfer specific frame size (defaults applied) */
 		dspi->bits_per_word = transfer->bits_per_word;
 		if (transfer->bits_per_word <= 8)
@@ -784,6 +797,11 @@ static int dspi_transfer_one_message(struct spi_master *master,
 				     SPI_FRAME_EBITS(transfer->bits_per_word) |
 				     SPI_CTARE_DTCP(1));
 
+		dspi->take_snapshot = (dspi->tx == dspi->ptp_sts_word);
+
+		if (dspi->take_snapshot)
+			ptp_read_system_prets(dspi->ptp_sts);
+
 		trans_mode = dspi->devtype_data->trans_mode;
 		switch (trans_mode) {
 		case DSPI_EOQ_MODE:
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 07/11] spi: spi-fsl-dspi: Add a debugging GPIO for monitoring latency
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (5 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 06/11] spi: spi-fsl-dspi: Implement the PTP system timestamping Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 08/11] spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode transfer Vladimir Oltean
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

This is being used to monitor the time it takes to transmit individual
bytes over SPI to the slave device. It is used in conjunction with the
PTP system timestamp feature - only the byte that was requested to be
timestamped triggers a toggle of the GPIO pin.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 30 +++++++++++++++++++++++++++---
 1 file changed, 27 insertions(+), 3 deletions(-)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index 3fc266d8263a..f0838853392d 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -203,6 +203,7 @@ struct fsl_dspi {
 	wait_queue_head_t	waitq;
 	u32			waitflags;
 
+	struct gpio_desc	*debug_gpio;
 	struct fsl_dspi_dma	*dma;
 };
 
@@ -223,6 +224,15 @@ static u32 dspi_pop_tx(struct fsl_dspi *dspi)
 	return txdata;
 }
 
+void dspi_debug_gpio(struct fsl_dspi *dspi, bool enabled)
+{
+	if (IS_ERR(dspi->debug_gpio)) {
+		dev_err(&dspi->pdev->dev, "Bad debug GPIO!\n");
+		return;
+	}
+	gpiod_set_value_cansleep(dspi->debug_gpio, enabled);
+}
+
 static u32 dspi_pop_tx_pushr(struct fsl_dspi *dspi)
 {
 	u16 cmd = dspi->tx_cmd, data = dspi_pop_tx(dspi);
@@ -661,8 +671,11 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 	u16 spi_tcnt;
 	u32 spi_tcr;
 
-	if (dspi->take_snapshot)
+	if (dspi->take_snapshot) {
 		ptp_read_system_postts(dspi->ptp_sts);
+		if (dspi->ptp_sts)
+			dspi_debug_gpio(dspi, 0);
+	}
 
 	/* Get transfer counter (in number of SPI transfers). It was
 	 * reset to 0 when transfer(s) were started.
@@ -683,8 +696,11 @@ static int dspi_rxtx(struct fsl_dspi *dspi)
 
 	dspi->take_snapshot = (dspi->tx == dspi->ptp_sts_word);
 
-	if (dspi->take_snapshot)
+	if (dspi->take_snapshot) {
+		if (dspi->ptp_sts)
+			dspi_debug_gpio(dspi, 1);
 		ptp_read_system_prets(dspi->ptp_sts);
+	}
 
 	if (dspi->devtype_data->trans_mode == DSPI_EOQ_MODE)
 		dspi_eoq_write(dspi);
@@ -799,8 +815,11 @@ static int dspi_transfer_one_message(struct spi_master *master,
 
 		dspi->take_snapshot = (dspi->tx == dspi->ptp_sts_word);
 
-		if (dspi->take_snapshot)
+		if (dspi->take_snapshot) {
+			if (dspi->ptp_sts)
+				dspi_debug_gpio(dspi, 1);
 			ptp_read_system_prets(dspi->ptp_sts);
+		}
 
 		trans_mode = dspi->devtype_data->trans_mode;
 		switch (trans_mode) {
@@ -1126,6 +1145,11 @@ static int dspi_probe(struct platform_device *pdev)
 		}
 	}
 
+	dspi->debug_gpio = devm_gpiod_get(&pdev->dev, "debug",
+					  GPIOD_OUT_HIGH);
+
+	dspi_debug_gpio(dspi, 0);
+
 	dspi->clk = devm_clk_get(&pdev->dev, "dspi");
 	if (IS_ERR(dspi->clk)) {
 		ret = PTR_ERR(dspi->clk);
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 08/11] spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode transfer
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (6 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 07/11] spi: spi-fsl-dspi: Add a debugging GPIO for monitoring latency Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 09/11] ARM: dts: ls1021a-tsn: Add debugging GPIOs for the SJA1105 and DSPI drivers Vladimir Oltean
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

While the poll mode helps reduce the overall latency in transmitting a
SPI message in the EOQ and TCFQ modes, the transmission can still have
jitter due to the CPU needing to service interrupts.

The transmission latency does not matter except in situations where the
SPI transfer represents the readout of a POSIX clock. In that case,
even with the byte-level PTP system timestamping in place, a pending IRQ
might find its way to be processed on the local CPU exactly during the
window when the transfer is snapshotted.

Disabling interrupts ensures the above situation never happens. When it
does, it manifests itself as random delay spikes, which throw off the
servo loop of phc2sys and make it lose lock.

Short phc2sys summary after 58 minutes of running without this patch:

  offset: min -26251 max 16416 mean -21.8672 std dev 863.416
  delay: min 4720 max 57280 mean 5182.49 std dev 1607.19
  lost servo lock 3 times

Summary of the same phc2sys service running for 120 minutes with the
patch:

  offset: min -378 max 381 mean -0.0083089 std dev 101.495
  delay: min 4720 max 5920 mean 5129.38 std dev 154.899
  lost servo lock 0 times

Disable interrupts unconditionally if running in poll mode.
Two aspects:
- If the DSPI driver is in IRQ mode, then disabling interrupts becomes a
  contradiction in terms. Poll mode is recommendable for predictable
  latency.
- In theory it should be possible to disable interrupts only for SPI
  transfers that represent an interaction with a POSIX clock. The driver
  can sense this by looking at transfer->ptp_sts. However enabling this
  unconditionally makes issues much more visible (and not just in fringe
  cases), were they to ever appear.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 drivers/spi/spi-fsl-dspi.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/drivers/spi/spi-fsl-dspi.c b/drivers/spi/spi-fsl-dspi.c
index f0838853392d..5404f1b45ad0 100644
--- a/drivers/spi/spi-fsl-dspi.c
+++ b/drivers/spi/spi-fsl-dspi.c
@@ -184,6 +184,8 @@ struct fsl_dspi {
 	int			irq;
 	struct clk		*clk;
 
+	/* Used to disable IRQs and preemption */
+	spinlock_t		lock;
 	struct ptp_system_timestamp *ptp_sts;
 	const void		*ptp_sts_word;
 	bool			take_snapshot;
@@ -757,6 +759,7 @@ static int dspi_transfer_one_message(struct spi_master *master,
 	struct spi_device *spi = message->spi;
 	enum dspi_trans_mode trans_mode;
 	struct spi_transfer *transfer;
+	unsigned long flags = 0;
 	int status = 0;
 
 	message->actual_length = 0;
@@ -813,6 +816,9 @@ static int dspi_transfer_one_message(struct spi_master *master,
 				     SPI_FRAME_EBITS(transfer->bits_per_word) |
 				     SPI_CTARE_DTCP(1));
 
+		if (!dspi->irq)
+			spin_lock_irqsave(&dspi->lock, flags);
+
 		dspi->take_snapshot = (dspi->tx == dspi->ptp_sts_word);
 
 		if (dspi->take_snapshot) {
@@ -848,6 +854,9 @@ static int dspi_transfer_one_message(struct spi_master *master,
 			do {
 				status = dspi_poll(dspi);
 			} while (status == -EAGAIN);
+
+			spin_unlock_irqrestore(&dspi->lock, flags);
+
 		} else if (trans_mode != DSPI_DMA_MODE) {
 			status = wait_event_interruptible(dspi->waitq,
 							  dspi->waitflags);
@@ -1178,6 +1187,7 @@ static int dspi_probe(struct platform_device *pdev)
 	}
 
 	init_waitqueue_head(&dspi->waitq);
+	spin_lock_init(&dspi->lock);
 
 poll_mode:
 	if (dspi->devtype_data->trans_mode == DSPI_DMA_MODE) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 09/11] ARM: dts: ls1021a-tsn: Add debugging GPIOs for the SJA1105 and DSPI drivers
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (7 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 08/11] spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode transfer Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 10/11] ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 11/11] ARM: dts: ls1021a-tsn: Reduce the SJA1105 SPI frequency for debug Vladimir Oltean
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

These GPIOs are exported to the expansion pin header at the rear of the
board:

EXP1_GPIO7: row 1, pin 9 from left
EXP1_GPIO6: row 1, pin 8 from left

Experimentally I could only see EXP1_GPIO6 (the pin currently assigned
to the DSPI driver) actually toggle on an analyzer - I don't know why,
but on my board, EXP1_GPIO7 isn't.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 arch/arm/boot/dts/ls1021a-tsn.dts | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/arch/arm/boot/dts/ls1021a-tsn.dts b/arch/arm/boot/dts/ls1021a-tsn.dts
index 9d4eee986f53..6cec454c484c 100644
--- a/arch/arm/boot/dts/ls1021a-tsn.dts
+++ b/arch/arm/boot/dts/ls1021a-tsn.dts
@@ -5,6 +5,7 @@
 
 /dts-v1/;
 #include "ls1021a.dtsi"
+#include <dt-bindings/gpio/gpio.h>
 
 / {
 	model = "NXP LS1021A-TSN Board";
@@ -34,6 +35,8 @@
 
 &dspi0 {
 	bus-num = <0>;
+	/* EXP1_GPIO6 is GPIO4_18 */
+	debug-gpios = <&gpio3 18 GPIO_ACTIVE_HIGH>;
 	status = "okay";
 
 	/* ADG704BRMZ 1:4 SPI mux/demux */
@@ -57,6 +60,8 @@
 		/* SPI controller settings for SJA1105 timing requirements */
 		fsl,spi-cs-sck-delay = <1000>;
 		fsl,spi-sck-cs-delay = <1000>;
+		/* EXP1_GPIO7 is GPIO4_19 */
+		debug-gpios = <&gpio3 19 GPIO_ACTIVE_HIGH>;
 
 		ports {
 			#address-cells = <1>;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 10/11] ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (8 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 09/11] ARM: dts: ls1021a-tsn: Add debugging GPIOs for the SJA1105 and DSPI drivers Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  2019-08-16  0:44 ` [RFC PATCH net-next 11/11] ARM: dts: ls1021a-tsn: Reduce the SJA1105 SPI frequency for debug Vladimir Oltean
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

Connected to the LS1021A DSPI is the SJA1105 DSA switch. This
constitutes 4 of the 6 Ethernet ports on this board. When using the
board as a PTP switch and bridging all 6 ports under one single L2
entity, it is good to also have the PTP clocks of the switch and of the
standalone Ethernet ports in sync.

This cannot be done with hardware timestamping, and is where phc2sys
comes into play. Using poll mode for SPI access helps ensure that all
transfers take a deterministic time to complete, which is an important
requirement for a TSN switch.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 arch/arm/boot/dts/ls1021a-tsn.dts | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/arm/boot/dts/ls1021a-tsn.dts b/arch/arm/boot/dts/ls1021a-tsn.dts
index 6cec454c484c..3b35e6b5977f 100644
--- a/arch/arm/boot/dts/ls1021a-tsn.dts
+++ b/arch/arm/boot/dts/ls1021a-tsn.dts
@@ -37,6 +37,7 @@
 	bus-num = <0>;
 	/* EXP1_GPIO6 is GPIO4_18 */
 	debug-gpios = <&gpio3 18 GPIO_ACTIVE_HIGH>;
+	/delete-property/ interrupts;
 	status = "okay";
 
 	/* ADG704BRMZ 1:4 SPI mux/demux */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [RFC PATCH net-next 11/11] ARM: dts: ls1021a-tsn: Reduce the SJA1105 SPI frequency for debug
  2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
                   ` (9 preceding siblings ...)
  2019-08-16  0:44 ` [RFC PATCH net-next 10/11] ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode Vladimir Oltean
@ 2019-08-16  0:44 ` Vladimir Oltean
  10 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16  0:44 UTC (permalink / raw)
  To: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli, broonie
  Cc: linux-spi, netdev, Vladimir Oltean

I have a logic analyzer that cannot sample signals at a higher frequency
than this, and it's nice to actually see the captured data and not just
an amorphous mess.

Signed-off-by: Vladimir Oltean <olteanv@gmail.com>
---
 arch/arm/boot/dts/ls1021a-tsn.dts | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/ls1021a-tsn.dts b/arch/arm/boot/dts/ls1021a-tsn.dts
index 3b35e6b5977f..8fdf4c3b24c7 100644
--- a/arch/arm/boot/dts/ls1021a-tsn.dts
+++ b/arch/arm/boot/dts/ls1021a-tsn.dts
@@ -55,7 +55,7 @@
 		#size-cells = <0>;
 		compatible = "nxp,sja1105t";
 		/* 12 MHz */
-		spi-max-frequency = <12000000>;
+		spi-max-frequency = <6000000>;
 		/* Sample data on trailing clock edge */
 		spi-cpha;
 		/* SPI controller settings for SJA1105 timing requirements */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-16  0:44 ` [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure Vladimir Oltean
@ 2019-08-16 12:18   ` Mark Brown
  2019-08-16 12:35     ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Brown @ 2019-08-16 12:18 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev

[-- Attachment #1: Type: text/plain, Size: 332 bytes --]

On Fri, Aug 16, 2019 at 03:44:41AM +0300, Vladimir Oltean wrote:

> @@ -842,6 +843,9 @@ struct spi_transfer {
>  
>  	u32		effective_speed_hz;
>  
> +	struct ptp_system_timestamp *ptp_sts;
> +	unsigned int	ptp_sts_word_offset;
> +

You've not documented these fields at all so it's not clear what the
intended usage is.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
  2019-08-16  0:44 ` [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup Vladimir Oltean
@ 2019-08-16 12:21   ` Mark Brown
  2019-08-16 12:37     ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Brown @ 2019-08-16 12:21 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: h.feurstein, mlichvar, richardcochran, andrew, f.fainelli,
	linux-spi, netdev

[-- Attachment #1: Type: text/plain, Size: 700 bytes --]

On Fri, Aug 16, 2019 at 03:44:42AM +0300, Vladimir Oltean wrote:
> This patch addresses some cosmetic issues:
> - Alignment
> - Typos
> - (Non-)use of BIT() and GENMASK() macros
> - Unused definitions
> - Unused includes
> - Abuse of ternary operator in detriment of readability
> - Reduce indentation level

This is difficult to review since there's a bunch of largely unrelated
changes all munged into one patch.  It'd be better to split this up so
each change makes one kind of fix, and better to do this separately to
the rest of the series.  In particular having alignment changes along
with other changes hurts reviewability as it's less immediately clear
what's a like for liken substitution.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-16 12:18   ` Mark Brown
@ 2019-08-16 12:35     ` Vladimir Oltean
  2019-08-16 12:58       ` Mark Brown
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16 12:35 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

Hi Mark,

On Fri, 16 Aug 2019 at 15:18, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 03:44:41AM +0300, Vladimir Oltean wrote:
>
> > @@ -842,6 +843,9 @@ struct spi_transfer {
> >
> >       u32             effective_speed_hz;
> >
> > +     struct ptp_system_timestamp *ptp_sts;
> > +     unsigned int    ptp_sts_word_offset;
> > +
>
> You've not documented these fields at all so it's not clear what the
> intended usage is.

Thanks for looking into this.
Indeed I didn't document them as the patch is part of a RFC and I
thought the purpose was more clear from the context (cover letter
etc).
If I do ever send a patchset for submission I will document the newly
introduced fields properly.
So let me clarify:
The SPI slave device driver is populating these fields to indicate to
the controller driver that it wants word number @ptp_sts_word_offset
from the tx buffer snapshotted. The controller driver is supposed to
put the snapshot into the @ptp_sts field, which is a pointer to a
memory location under the control of the SPI slave device driver.
It is ok if the ptp_sts pointer is NULL (no need to check), because
the API for taking snapshots already checks for that.
At the moment there is yet no proposed mechanism for the SPI slave
driver to ensure that the controller will really act upon this
request. That would be really nice to have, since some SPI slave
devices are time-sensitive and warning early is a good way to prevent
unnecessary troubleshooting.

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
  2019-08-16 12:21   ` Mark Brown
@ 2019-08-16 12:37     ` Vladimir Oltean
  2019-08-16 12:59       ` Mark Brown
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16 12:37 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

Hi Mark,

On Fri, 16 Aug 2019 at 15:21, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 03:44:42AM +0300, Vladimir Oltean wrote:
> > This patch addresses some cosmetic issues:
> > - Alignment
> > - Typos
> > - (Non-)use of BIT() and GENMASK() macros
> > - Unused definitions
> > - Unused includes
> > - Abuse of ternary operator in detriment of readability
> > - Reduce indentation level
>
> This is difficult to review since there's a bunch of largely unrelated
> changes all munged into one patch.  It'd be better to split this up so
> each change makes one kind of fix, and better to do this separately to
> the rest of the series.  In particular having alignment changes along
> with other changes hurts reviewability as it's less immediately clear
> what's a like for liken substitution.

Yes, the diff of this patch looks relatively bad. But I don't know if
splitting it in more patches isn't in fact going to pollute the git
history, so I can just as well drop it.

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-16 12:35     ` Vladimir Oltean
@ 2019-08-16 12:58       ` Mark Brown
  2019-08-16 14:05         ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Brown @ 2019-08-16 12:58 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

[-- Attachment #1: Type: text/plain, Size: 2533 bytes --]

On Fri, Aug 16, 2019 at 03:35:30PM +0300, Vladimir Oltean wrote:
> On Fri, 16 Aug 2019 at 15:18, Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Aug 16, 2019 at 03:44:41AM +0300, Vladimir Oltean wrote:

> > > @@ -842,6 +843,9 @@ struct spi_transfer {
> > >
> > >       u32             effective_speed_hz;
> > >
> > > +     struct ptp_system_timestamp *ptp_sts;
> > > +     unsigned int    ptp_sts_word_offset;
> > > +

> > You've not documented these fields at all so it's not clear what the
> > intended usage is.

> Thanks for looking into this.
> Indeed I didn't document them as the patch is part of a RFC and I
> thought the purpose was more clear from the context (cover letter
> etc).
> If I do ever send a patchset for submission I will document the newly
> introduced fields properly.

The issue I'm having is that I have zero idea about the PTP API so I've
got nothing to go on when thinking about if this approach makes any
sense unless I go do some research.

> So let me clarify:
> The SPI slave device driver is populating these fields to indicate to
> the controller driver that it wants word number @ptp_sts_word_offset
> from the tx buffer snapshotted. The controller driver is supposed to
> put the snapshot into the @ptp_sts field, which is a pointer to a
> memory location under the control of the SPI slave device driver.

Snapshot here basically meaning recording a timestamp?  This interface
does seem like it basically precludes DMA based controllers from using
it unless someone happened to implement some very specific stuff in
hardware which seems implausible.  I'd be inclined to just require that
users can only snapshot the first (and possibly also the last, though
DMA completions make that fun) word of a transfer, we could then pull
this out into the core a bit by providing a wrapper function drivers
should call at the appropriate moment.

> It is ok if the ptp_sts pointer is NULL (no need to check), because
> the API for taking snapshots already checks for that.
> At the moment there is yet no proposed mechanism for the SPI slave
> driver to ensure that the controller will really act upon this
> request. That would be really nice to have, since some SPI slave
> devices are time-sensitive and warning early is a good way to prevent
> unnecessary troubleshooting.

Yes, that's one of the things I was thinking about looking at the series
- we should at least be able to warn if we can't timestamp so nobody
gets confused, possibly error out if the calling code particularly
depends on it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
  2019-08-16 12:37     ` Vladimir Oltean
@ 2019-08-16 12:59       ` Mark Brown
  2019-08-17 10:44         ` Vladimir Oltean
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Brown @ 2019-08-16 12:59 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

[-- Attachment #1: Type: text/plain, Size: 993 bytes --]

On Fri, Aug 16, 2019 at 03:37:46PM +0300, Vladimir Oltean wrote:
> On Fri, 16 Aug 2019 at 15:21, Mark Brown <broonie@kernel.org> wrote:

> > This is difficult to review since there's a bunch of largely unrelated
> > changes all munged into one patch.  It'd be better to split this up so
> > each change makes one kind of fix, and better to do this separately to
> > the rest of the series.  In particular having alignment changes along
> > with other changes hurts reviewability as it's less immediately clear
> > what's a like for liken substitution.

> Yes, the diff of this patch looks relatively bad. But I don't know if
> splitting it in more patches isn't in fact going to pollute the git
> history, so I can just as well drop it.

No problem with lots of patches in git history if you want to split it
up (and probably split it out of the series).  Like I say it's mainly
the alignment changes that it'd be better to pull out, the others really
should be but it's easier to cope there.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-16 12:58       ` Mark Brown
@ 2019-08-16 14:05         ` Vladimir Oltean
  2019-08-19  0:43           ` Andrew Lunn
  2019-08-20 12:55           ` Mark Brown
  0 siblings, 2 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-16 14:05 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

On Fri, 16 Aug 2019 at 15:58, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 03:35:30PM +0300, Vladimir Oltean wrote:
> > On Fri, 16 Aug 2019 at 15:18, Mark Brown <broonie@kernel.org> wrote:
> > > On Fri, Aug 16, 2019 at 03:44:41AM +0300, Vladimir Oltean wrote:
>
> > > > @@ -842,6 +843,9 @@ struct spi_transfer {
> > > >
> > > >       u32             effective_speed_hz;
> > > >
> > > > +     struct ptp_system_timestamp *ptp_sts;
> > > > +     unsigned int    ptp_sts_word_offset;
> > > > +
>
> > > You've not documented these fields at all so it's not clear what the
> > > intended usage is.
>
> > Thanks for looking into this.
> > Indeed I didn't document them as the patch is part of a RFC and I
> > thought the purpose was more clear from the context (cover letter
> > etc).
> > If I do ever send a patchset for submission I will document the newly
> > introduced fields properly.
>
> The issue I'm having is that I have zero idea about the PTP API so I've
> got nothing to go on when thinking about if this approach makes any
> sense unless I go do some research.
>
> > So let me clarify:
> > The SPI slave device driver is populating these fields to indicate to
> > the controller driver that it wants word number @ptp_sts_word_offset
> > from the tx buffer snapshotted. The controller driver is supposed to
> > put the snapshot into the @ptp_sts field, which is a pointer to a
> > memory location under the control of the SPI slave device driver.
>
> Snapshot here basically meaning recording a timestamp?  This interface
> does seem like it basically precludes DMA based controllers from using
> it unless someone happened to implement some very specific stuff in
> hardware which seems implausible.  I'd be inclined to just require that
> users can only snapshot the first (and possibly also the last, though
> DMA completions make that fun) word of a transfer, we could then pull
> this out into the core a bit by providing a wrapper function drivers
> should call at the appropriate moment.
>

I'm not sure how to respond to this, because I don't know anything
about the timing of DMA transfers.
Maybe snapshotting DMA transfers the same way is not possible (if at
all). Maybe they are not exactly adequate for this sort of application
anyway. Maybe it depends.
But the switch I'm working on is issuing an internal read transaction
of the PTP timer exactly at the 4th-to-last bit of the 3rd byte. This
is so that it has time (4 SPI clock cycles, to be precise) for the
result of the read transaction to become available again to the SPI
block, for output. It is impossible to know exactly when the switch
will snapshot the time internally (because there are several clock
domain crossings from the SPI interface towards its core) but for
certain it takes place during the latter part of the 3rd SPI byte. I
believe other devices are similar in this regard.
In other words, from a purely performance perspective, I am against
limiting the API to just snapshotting the first and last byte. At this
level of "zoom", if I change the offset of the byte to anything other
than 3, the synchronization offset refuses to converge towards zero,
because the snapshot is incurring a constant offset that the servo
loop from userspace (phc2sys) can't compensate for.
Maybe the SPI master driver should just report what sort of
snapshotting capability it can offer, ranging from none (default
unless otherwise specified), to transfer-level (DMA style) or
byte-level.
I'm afraid more actual experimentation is needed with DMA-based
controllers to understand what can be expected from them, and as a
result, how the API should map around them.
MDIO bus controllers are in a similar situation (with Hubert's patch)
but at least there the frame size is fixed and I haven't heard of an
MDIO controller to use DMA.
I'm not really sure what the next step would be. In the other thread,
Richard Cochran mentioned something about a two-part write API,
although I didn't quite understand the idea behind it.


> > It is ok if the ptp_sts pointer is NULL (no need to check), because
> > the API for taking snapshots already checks for that.
> > At the moment there is yet no proposed mechanism for the SPI slave
> > driver to ensure that the controller will really act upon this
> > request. That would be really nice to have, since some SPI slave
> > devices are time-sensitive and warning early is a good way to prevent
> > unnecessary troubleshooting.
>
> Yes, that's one of the things I was thinking about looking at the series
> - we should at least be able to warn if we can't timestamp so nobody
> gets confused, possibly error out if the calling code particularly
> depends on it.

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup
  2019-08-16 12:59       ` Mark Brown
@ 2019-08-17 10:44         ` Vladimir Oltean
  0 siblings, 0 replies; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-17 10:44 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev, David S. Miller

On Fri, 16 Aug 2019 at 15:59, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 03:37:46PM +0300, Vladimir Oltean wrote:
> > On Fri, 16 Aug 2019 at 15:21, Mark Brown <broonie@kernel.org> wrote:
>
> > > This is difficult to review since there's a bunch of largely unrelated
> > > changes all munged into one patch.  It'd be better to split this up so
> > > each change makes one kind of fix, and better to do this separately to
> > > the rest of the series.  In particular having alignment changes along
> > > with other changes hurts reviewability as it's less immediately clear
> > > what's a like for liken substitution.
>
> > Yes, the diff of this patch looks relatively bad. But I don't know if
> > splitting it in more patches isn't in fact going to pollute the git
> > history, so I can just as well drop it.
>
> No problem with lots of patches in git history if you want to split it
> up (and probably split it out of the series).  Like I say it's mainly
> the alignment changes that it'd be better to pull out, the others really
> should be but it's easier to cope there.

Yes, normally it would make sense to pull these out of the patchset.
But basically all the future patches I plan to send to net-next for
this release somehow depend on this dspi driver rework.
My plan was that once the patchset reaches a stage where you accept
it, to ask Dave M. to temporarily pull the series into net-next as
well, so that the tree compiles and I can continue to work on other
sja1105 stuff. He can then drop it during the merge window. From that
perspective, even if the entire series takes more time to get accepted
rather than individual bits, at least there would be 1 single patchset
for Dave to pull.
Let me know if there's a better way to handle this.

Thanks,
-Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-16 14:05         ` Vladimir Oltean
@ 2019-08-19  0:43           ` Andrew Lunn
  2019-08-20 12:55           ` Mark Brown
  1 sibling, 0 replies; 24+ messages in thread
From: Andrew Lunn @ 2019-08-19  0:43 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Mark Brown, Hubert Feurstein, mlichvar, Richard Cochran,
	Florian Fainelli, linux-spi, netdev

> MDIO bus controllers are in a similar situation (with Hubert's patch)
> but at least there the frame size is fixed and I haven't heard of an
> MDIO controller to use DMA.

Linux does not have any DMA driver MDIO busses, as far as i know. It
does not make sense, since you are only transferring 16bits of
data. The vast majority are polled completion, but there is one which
generates an interrupt on completion.

	Andrew

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-16 14:05         ` Vladimir Oltean
  2019-08-19  0:43           ` Andrew Lunn
@ 2019-08-20 12:55           ` Mark Brown
  2019-08-20 13:48             ` Vladimir Oltean
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Brown @ 2019-08-20 12:55 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

[-- Attachment #1: Type: text/plain, Size: 1860 bytes --]

On Fri, Aug 16, 2019 at 05:05:53PM +0300, Vladimir Oltean wrote:

> I'm not sure how to respond to this, because I don't know anything
> about the timing of DMA transfers.
> Maybe snapshotting DMA transfers the same way is not possible (if at
> all). Maybe they are not exactly adequate for this sort of application
> anyway. Maybe it depends.

DMA transfers generally proceed without any involvement from the CPU,
this is broadly the point of DMA.  You *may* be able to split into
multiple transactions but it's not reliable that you'd be able to do so
on byte boundaries and there will be latency getting notified of
completions.

> In other words, from a purely performance perspective, I am against
> limiting the API to just snapshotting the first and last byte. At this
> level of "zoom", if I change the offset of the byte to anything other
> than 3, the synchronization offset refuses to converge towards zero,
> because the snapshot is incurring a constant offset that the servo
> loop from userspace (phc2sys) can't compensate for.

> Maybe the SPI master driver should just report what sort of
> snapshotting capability it can offer, ranging from none (default
> unless otherwise specified), to transfer-level (DMA style) or
> byte-level.

That does then have the consequence that the majority of controllers
aren't going to be usable with the API which isn't great.

> I'm afraid more actual experimentation is needed with DMA-based
> controllers to understand what can be expected from them, and as a
> result, how the API should map around them.
> MDIO bus controllers are in a similar situation (with Hubert's patch)
> but at least there the frame size is fixed and I haven't heard of an
> MDIO controller to use DMA.

I'm not 100% clear what the problem you're trying to solve is, or if
it's a sensible problem to try to solve for that matter.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-20 12:55           ` Mark Brown
@ 2019-08-20 13:48             ` Vladimir Oltean
  2019-08-20 16:49               ` Mark Brown
  0 siblings, 1 reply; 24+ messages in thread
From: Vladimir Oltean @ 2019-08-20 13:48 UTC (permalink / raw)
  To: Mark Brown
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

Hi Mark,

On Tue, 20 Aug 2019 at 15:55, Mark Brown <broonie@kernel.org> wrote:
>
> On Fri, Aug 16, 2019 at 05:05:53PM +0300, Vladimir Oltean wrote:
>
> > I'm not sure how to respond to this, because I don't know anything
> > about the timing of DMA transfers.
> > Maybe snapshotting DMA transfers the same way is not possible (if at
> > all). Maybe they are not exactly adequate for this sort of application
> > anyway. Maybe it depends.
>
> DMA transfers generally proceed without any involvement from the CPU,
> this is broadly the point of DMA.  You *may* be able to split into
> multiple transactions but it's not reliable that you'd be able to do so
> on byte boundaries and there will be latency getting notified of
> completions.
>
> > In other words, from a purely performance perspective, I am against
> > limiting the API to just snapshotting the first and last byte. At this
> > level of "zoom", if I change the offset of the byte to anything other
> > than 3, the synchronization offset refuses to converge towards zero,
> > because the snapshot is incurring a constant offset that the servo
> > loop from userspace (phc2sys) can't compensate for.
>
> > Maybe the SPI master driver should just report what sort of
> > snapshotting capability it can offer, ranging from none (default
> > unless otherwise specified), to transfer-level (DMA style) or
> > byte-level.
>
> That does then have the consequence that the majority of controllers
> aren't going to be usable with the API which isn't great.
>

Can we continue this discussion on this thread:
https://www.spinics.net/lists/netdev/msg593772.html
The whole point there is that if there's nothing that the driver can
do, the SPI core will take the timestamps and record their (bad)
precision.

> > I'm afraid more actual experimentation is needed with DMA-based
> > controllers to understand what can be expected from them, and as a
> > result, how the API should map around them.
> > MDIO bus controllers are in a similar situation (with Hubert's patch)
> > but at least there the frame size is fixed and I haven't heard of an
> > MDIO controller to use DMA.
>
> I'm not 100% clear what the problem you're trying to solve is, or if
> it's a sensible problem to try to solve for that matter.

The problem can simply be summarized as: you're trying to read a clock
over SPI, but there's so much timing jitter in you doing that, that
you have a high degree of uncertainty in the actual precision of the
readout you took.
The solution has two parts:
- Make the SPI access itself more predictable in terms of latency.
This is always going to have to be dealt with on a driver-by-driver,
hardware-by-hardware basis.
- Provide a way of taking a software timestamp in the time interval
when the latency is predictable, and as close as possible to the
moment when the SPI slave will receive the request. Disabling
interrupts and preemption always helps to snapshot that critical
section. Again, the SPI core can't do that. And finding the correct
"pre" and "post" hooks that surround the hardware transfer in a
deterministic fashion is crucial. If you read the cover letter, I used
a GPIO pin to make sure the timestamps are where they should be, and
that they don't vary in width (post - pre) - there are also some
screenshots on Gdrive. Maybe something similar is not impossible for a
DMA transfer, although the problem formulation so far is too vague to
emit a more clear statement.
If you know when the SPI slave's clock was actually read, you have a
better idea of what time it was.

Regards,
-Vladimir

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure
  2019-08-20 13:48             ` Vladimir Oltean
@ 2019-08-20 16:49               ` Mark Brown
  0 siblings, 0 replies; 24+ messages in thread
From: Mark Brown @ 2019-08-20 16:49 UTC (permalink / raw)
  To: Vladimir Oltean
  Cc: Hubert Feurstein, mlichvar, Richard Cochran, Andrew Lunn,
	Florian Fainelli, linux-spi, netdev

[-- Attachment #1: Type: text/plain, Size: 1256 bytes --]

On Tue, Aug 20, 2019 at 04:48:39PM +0300, Vladimir Oltean wrote:
> On Tue, 20 Aug 2019 at 15:55, Mark Brown <broonie@kernel.org> wrote:
> > On Fri, Aug 16, 2019 at 05:05:53PM +0300, Vladimir Oltean wrote:

> > > Maybe the SPI master driver should just report what sort of
> > > snapshotting capability it can offer, ranging from none (default
> > > unless otherwise specified), to transfer-level (DMA style) or
> > > byte-level.

> > That does then have the consequence that the majority of controllers
> > aren't going to be usable with the API which isn't great.

> Can we continue this discussion on this thread:
> https://www.spinics.net/lists/netdev/msg593772.html
> The whole point there is that if there's nothing that the driver can
> do, the SPI core will take the timestamps and record their (bad)
> precision.

I'm not on that thread.

> > I'm not 100% clear what the problem you're trying to solve is, or if
> > it's a sensible problem to try to solve for that matter.

> The problem can simply be summarized as: you're trying to read a clock
> over SPI, but there's so much timing jitter in you doing that, that
> you have a high degree of uncertainty in the actual precision of the
> readout you took.

That doesn't seem like a great idea...

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-16  0:44 [RFC PATCH net-next 00/11] Deterministic SPI latency on NXP Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 01/11] net: dsa: sja1105: Add a debugging GPIO for monitoring SPI latency Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 02/11] net: dsa: sja1105: Implement the .gettimex64 system call for PTP Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 03/11] spi: Add a PTP system timestamp to the transfer structure Vladimir Oltean
2019-08-16 12:18   ` Mark Brown
2019-08-16 12:35     ` Vladimir Oltean
2019-08-16 12:58       ` Mark Brown
2019-08-16 14:05         ` Vladimir Oltean
2019-08-19  0:43           ` Andrew Lunn
2019-08-20 12:55           ` Mark Brown
2019-08-20 13:48             ` Vladimir Oltean
2019-08-20 16:49               ` Mark Brown
2019-08-16  0:44 ` [RFC PATCH net-next 04/11] spi: spi-fsl-dspi: Cosmetic cleanup Vladimir Oltean
2019-08-16 12:21   ` Mark Brown
2019-08-16 12:37     ` Vladimir Oltean
2019-08-16 12:59       ` Mark Brown
2019-08-17 10:44         ` Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 05/11] spi: spi-fsl-dspi: Use poll mode in case the platform IRQ is missing Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 06/11] spi: spi-fsl-dspi: Implement the PTP system timestamping Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 07/11] spi: spi-fsl-dspi: Add a debugging GPIO for monitoring latency Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 08/11] spi: spi-fsl-dspi: Disable interrupts and preemption during poll mode transfer Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 09/11] ARM: dts: ls1021a-tsn: Add debugging GPIOs for the SJA1105 and DSPI drivers Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 10/11] ARM: dts: ls1021a-tsn: Use the DSPI controller in poll mode Vladimir Oltean
2019-08-16  0:44 ` [RFC PATCH net-next 11/11] ARM: dts: ls1021a-tsn: Reduce the SJA1105 SPI frequency for debug Vladimir Oltean

Netdev Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/netdev/0 netdev/git/0.git
	git clone --mirror https://lore.kernel.org/netdev/1 netdev/git/1.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 netdev netdev/ https://lore.kernel.org/netdev \
		netdev@vger.kernel.org netdev@archiver.kernel.org
	public-inbox-index netdev

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.netdev


AGPL code for this site: git clone https://public-inbox.org/ public-inbox