Linux-SPI Archive on lore.kernel.org
 help / color / Atom feed
* [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
@ 2021-03-11 19:12 Pratyush Yadav
  2021-03-11 19:12 ` [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration Pratyush Yadav
                   ` (7 more replies)
  0 siblings, 8 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-11 19:12 UTC (permalink / raw)
  To: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Michael Walle, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mark Brown, linux-arm-kernel, devicetree,
	linux-kernel, linux-mtd, linux-spi
  Cc: Pratyush Yadav, Lokesh Vutla

Hi,

This series adds support for OSPI PHY calibration on the Cadence OSPI
controller. This calibration procedure is needed to allow high clock
speeds in 8D-8D-8D mode. The procedure reads some pre-determined pattern
data from the flash and runs a sequence of test reads to find out the
optimal delays for high speed transfer. More details on the calibration
procedure in patch 5/6.

The main problem here is telling the controller where to find the
pattern and how to read it. This RFC uses nvmem cells which point to a
fixed partition containing the data to do the reads. It depends on [0]
and [1].

The obvious problem with this is it won't work when the partitions are
defined via command line. I don't see any good way to add nvmem cells to
command line partitions. I would like some help or ideas here. We don't
necessarily have to use nvmem either. Any way that can cleanly and
consistently let the controller find out where the pattern is stored is
good.

The dts patch depends on [2].

Tested on TI's J721E EVM.

[0] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210302190012.1255-1-zajec5@gmail.com/
[1] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210308011853.19360-1-ansuelsmth@gmail.com/
[2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210305153926.3479-2-p.yadav@ti.com/

Pratyush Yadav (6):
  spi: spi-mem: Tell controller when device is ready for calibration
  mtd: spi-nor: core: consolidate read op creation
  mtd: spi-nor: core: run calibration when initialization is done
  spi: cadence-qspi: Use PHY for DAC reads if possible
  spi: cadence-qspi: Tune PHY to allow running at higher frequencies
  arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration

 arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi |  55 ++
 drivers/mtd/spi-nor/core.c                  |  74 +-
 drivers/spi/spi-cadence-quadspi.c           | 820 +++++++++++++++++++-
 drivers/spi/spi-mem.c                       |  12 +
 include/linux/spi/spi-mem.h                 |   8 +
 5 files changed, 916 insertions(+), 53 deletions(-)

--
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
@ 2021-03-11 19:12 ` Pratyush Yadav
  2021-03-23 23:07   ` Michael Walle
  2021-03-11 19:12 ` [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation Pratyush Yadav
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-11 19:12 UTC (permalink / raw)
  To: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Michael Walle, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mark Brown, linux-arm-kernel, devicetree,
	linux-kernel, linux-mtd, linux-spi
  Cc: Pratyush Yadav, Lokesh Vutla

Some controllers like the Cadence OSPI controller need to perform a
calibration sequence to operate at high clock speeds. This calibration
should happen after the flash is fully initialized otherwise the
calibration might happen in a different SPI mode from the one the flash
is finally set to. Add a hook that can be used to tell the controller
when the flash is ready for calibration. Whether calibration is needed
depends on the controller.

Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
---
 drivers/spi/spi-mem.c       | 12 ++++++++++++
 include/linux/spi/spi-mem.h |  8 ++++++++
 2 files changed, 20 insertions(+)

diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
index dc713b0c3c4d..e2f05ad3f4dc 100644
--- a/drivers/spi/spi-mem.c
+++ b/drivers/spi/spi-mem.c
@@ -464,6 +464,18 @@ int spi_mem_adjust_op_size(struct spi_mem *mem, struct spi_mem_op *op)
 }
 EXPORT_SYMBOL_GPL(spi_mem_adjust_op_size);
 
+int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op)
+{
+	struct spi_controller *ctlr = mem->spi->controller;
+
+	if (!ctlr->mem_ops || !ctlr->mem_ops->do_calibration)
+		return -EOPNOTSUPP;
+
+	ctlr->mem_ops->do_calibration(mem, op);
+	return 0;
+}
+EXPORT_SYMBOL_GPL(spi_mem_do_calibration);
+
 static ssize_t spi_mem_no_dirmap_read(struct spi_mem_dirmap_desc *desc,
 				      u64 offs, size_t len, void *buf)
 {
diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h
index 2b65c9edc34e..97a2d280f2d0 100644
--- a/include/linux/spi/spi-mem.h
+++ b/include/linux/spi/spi-mem.h
@@ -250,6 +250,12 @@ static inline void *spi_mem_get_drvdata(struct spi_mem *mem)
  *		  the currently mapped area), and the caller of
  *		  spi_mem_dirmap_write() is responsible for calling it again in
  *		  this case.
+ * @do_calibration: perform calibration needed for high SPI clock speed
+ *		    operation. Should be called after the SPI memory device has
+ *		    been completely initialized. The op passed should contain
+ *		    a template for the read operation used for the device so
+ *		    the controller can decide what type of calibration is
+ *		    required for this type of read.
  *
  * This interface should be implemented by SPI controllers providing an
  * high-level interface to execute SPI memory operation, which is usually the
@@ -274,6 +280,7 @@ struct spi_controller_mem_ops {
 			       u64 offs, size_t len, void *buf);
 	ssize_t (*dirmap_write)(struct spi_mem_dirmap_desc *desc,
 				u64 offs, size_t len, const void *buf);
+	void (*do_calibration)(struct spi_mem *mem, struct spi_mem_op *op);
 };
 
 /**
@@ -346,6 +353,7 @@ bool spi_mem_dtr_supports_op(struct spi_mem *mem,
 #endif /* CONFIG_SPI_MEM */
 
 int spi_mem_adjust_op_size(struct spi_mem *mem, struct spi_mem_op *op);
+int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op);
 
 bool spi_mem_supports_op(struct spi_mem *mem,
 			 const struct spi_mem_op *op);
-- 
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
  2021-03-11 19:12 ` [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration Pratyush Yadav
@ 2021-03-11 19:12 ` Pratyush Yadav
  2021-03-23 23:17   ` Michael Walle
  2021-04-08 12:48   ` Michael Walle
  2021-03-11 19:12 ` [RFC PATCH 3/6] mtd: spi-nor: core: run calibration when initialization is done Pratyush Yadav
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-11 19:12 UTC (permalink / raw)
  To: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Michael Walle, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mark Brown, linux-arm-kernel, devicetree,
	linux-kernel, linux-mtd, linux-spi
  Cc: Pratyush Yadav, Lokesh Vutla

Currently the spi_mem_op to read from the flash is used in two places:
spi_nor_create_read_dirmap() and spi_nor_spimem_read_data(). In a later
commit this number will increase to three. Instead of repeating the same
code thrice, add a function that returns a template of the read op. The
callers can then fill in the details like address, data length, data
buffer location.

Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
---
 drivers/mtd/spi-nor/core.c | 62 ++++++++++++++++++++------------------
 1 file changed, 32 insertions(+), 30 deletions(-)

diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 4a315cb1c4db..88888df009f0 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -183,6 +183,33 @@ static int spi_nor_controller_ops_erase(struct spi_nor *nor, loff_t offs)
 	return nor->controller_ops->erase(nor, offs);
 }
 
+/**
+ * spi_nor_spimem_get_read_op() - return a template for the spi_mem_op used for
+ *                                reading data from the flash via spi-mem.
+ * @nor:        pointer to 'struct spi_nor'
+ *
+ * Return: A template of the 'struct spi_mem_op' for used for reading data from
+ * the flash. The caller is expected to fill in the address, data length, and
+ * the data buffer.
+ */
+static struct spi_mem_op spi_nor_spimem_get_read_op(struct spi_nor *nor)
+{
+	struct spi_mem_op op =
+		SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
+			   SPI_MEM_OP_ADDR(nor->addr_width, 0, 0),
+			   SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
+			   SPI_MEM_OP_DATA_IN(1, NULL, 0));
+
+	spi_nor_spimem_setup_op(nor, &op, nor->read_proto);
+
+	/* convert the dummy cycles to the number of bytes */
+	op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
+	if (spi_nor_protocol_is_dtr(nor->read_proto))
+		op.dummy.nbytes *= 2;
+
+	return op;
+}
+
 /**
  * spi_nor_spimem_read_data() - read data from flash's memory region via
  *                              spi-mem
@@ -196,21 +223,14 @@ static int spi_nor_controller_ops_erase(struct spi_nor *nor, loff_t offs)
 static ssize_t spi_nor_spimem_read_data(struct spi_nor *nor, loff_t from,
 					size_t len, u8 *buf)
 {
-	struct spi_mem_op op =
-		SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
-			   SPI_MEM_OP_ADDR(nor->addr_width, from, 0),
-			   SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
-			   SPI_MEM_OP_DATA_IN(len, buf, 0));
+	struct spi_mem_op op = spi_nor_spimem_get_read_op(nor);
 	bool usebouncebuf;
 	ssize_t nbytes;
 	int error;
 
-	spi_nor_spimem_setup_op(nor, &op, nor->read_proto);
-
-	/* convert the dummy cycles to the number of bytes */
-	op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
-	if (spi_nor_protocol_is_dtr(nor->read_proto))
-		op.dummy.nbytes *= 2;
+	op.addr.val = from;
+	op.data.nbytes = len;
+	op.data.buf.in = buf;
 
 	usebouncebuf = spi_nor_spimem_bounce(nor, &op);
 
@@ -3581,28 +3601,10 @@ EXPORT_SYMBOL_GPL(spi_nor_scan);
 static int spi_nor_create_read_dirmap(struct spi_nor *nor)
 {
 	struct spi_mem_dirmap_info info = {
-		.op_tmpl = SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
-				      SPI_MEM_OP_ADDR(nor->addr_width, 0, 0),
-				      SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
-				      SPI_MEM_OP_DATA_IN(0, NULL, 0)),
+		.op_tmpl = spi_nor_spimem_get_read_op(nor),
 		.offset = 0,
 		.length = nor->mtd.size,
 	};
-	struct spi_mem_op *op = &info.op_tmpl;
-
-	spi_nor_spimem_setup_op(nor, op, nor->read_proto);
-
-	/* convert the dummy cycles to the number of bytes */
-	op->dummy.nbytes = (nor->read_dummy * op->dummy.buswidth) / 8;
-	if (spi_nor_protocol_is_dtr(nor->read_proto))
-		op->dummy.nbytes *= 2;
-
-	/*
-	 * Since spi_nor_spimem_setup_op() only sets buswidth when the number
-	 * of data bytes is non-zero, the data buswidth won't be set here. So,
-	 * do it explicitly.
-	 */
-	op->data.buswidth = spi_nor_get_protocol_data_nbits(nor->read_proto);
 
 	nor->dirmap.rdesc = devm_spi_mem_dirmap_create(nor->dev, nor->spimem,
 						       &info);
-- 
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 3/6] mtd: spi-nor: core: run calibration when initialization is done
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
  2021-03-11 19:12 ` [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration Pratyush Yadav
  2021-03-11 19:12 ` [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation Pratyush Yadav
@ 2021-03-11 19:12 ` Pratyush Yadav
  2021-03-11 19:12 ` [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible Pratyush Yadav
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-11 19:12 UTC (permalink / raw)
  To: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Michael Walle, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mark Brown, linux-arm-kernel, devicetree,
	linux-kernel, linux-mtd, linux-spi
  Cc: Pratyush Yadav, Lokesh Vutla

Once the flash is initialized tell the controller it can run
calibration procedures if needed. This can be useful when calibration is
needed to run at higher clock speeds.

Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
---
 drivers/mtd/spi-nor/core.c | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
index 88888df009f0..e0cbcaf1be89 100644
--- a/drivers/mtd/spi-nor/core.c
+++ b/drivers/mtd/spi-nor/core.c
@@ -3650,6 +3650,7 @@ static int spi_nor_probe(struct spi_mem *spimem)
 	 * checking what's really supported using spi_mem_supports_op().
 	 */
 	const struct spi_nor_hwcaps hwcaps = { .mask = SNOR_HWCAPS_ALL };
+	struct spi_mem_op op;
 	char *flash_name;
 	int ret;
 
@@ -3709,8 +3710,15 @@ static int spi_nor_probe(struct spi_mem *spimem)
 	if (ret)
 		return ret;
 
-	return mtd_device_register(&nor->mtd, data ? data->parts : NULL,
-				   data ? data->nr_parts : 0);
+	ret = mtd_device_register(&nor->mtd, data ? data->parts : NULL,
+				  data ? data->nr_parts : 0);
+	if (ret)
+		return ret;
+
+	op = spi_nor_spimem_get_read_op(nor);
+	spi_mem_do_calibration(nor->spimem, &op);
+
+	return 0;
 }
 
 static int spi_nor_remove(struct spi_mem *spimem)
-- 
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
                   ` (2 preceding siblings ...)
  2021-03-11 19:12 ` [RFC PATCH 3/6] mtd: spi-nor: core: run calibration when initialization is done Pratyush Yadav
@ 2021-03-11 19:12 ` Pratyush Yadav
  2021-03-12  9:13   ` Tudor.Ambarus
  2021-03-11 19:12 ` [RFC PATCH 5/6] spi: cadence-qspi: Tune PHY to allow running at higher frequencies Pratyush Yadav
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-11 19:12 UTC (permalink / raw)
  To: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Michael Walle, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mark Brown, linux-arm-kernel, devicetree,
	linux-kernel, linux-mtd, linux-spi
  Cc: Pratyush Yadav, Lokesh Vutla

Check if a read is eligible for PHY and if it is, enable PHY and DQS.

Since PHY reads only work at an address that is 16-byte aligned and of
size that is a multiple of 16 bytes, read the starting and ending
unaligned portions without PHY, and only enable PHY for the middle part.

Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
---
 drivers/spi/spi-cadence-quadspi.c | 203 ++++++++++++++++++++++++++----
 1 file changed, 182 insertions(+), 21 deletions(-)

diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
index e2d6ea833423..e64d8e125263 100644
--- a/drivers/spi/spi-cadence-quadspi.c
+++ b/drivers/spi/spi-cadence-quadspi.c
@@ -41,19 +41,27 @@
 
 struct cqspi_st;
 
+struct phy_setting {
+	u8		rx;
+	u8		tx;
+	u8		read_delay;
+};
+
 struct cqspi_flash_pdata {
-	struct cqspi_st	*cqspi;
-	u32		clk_rate;
-	u32		read_delay;
-	u32		tshsl_ns;
-	u32		tsd2d_ns;
-	u32		tchsh_ns;
-	u32		tslch_ns;
-	u8		inst_width;
-	u8		addr_width;
-	u8		data_width;
-	bool		dtr;
-	u8		cs;
+	struct cqspi_st		*cqspi;
+	u32			clk_rate;
+	u32			read_delay;
+	u32			tshsl_ns;
+	u32			tsd2d_ns;
+	u32			tchsh_ns;
+	u32			tslch_ns;
+	u8			inst_width;
+	u8			addr_width;
+	u8			data_width;
+	bool			dtr;
+	u8			cs;
+	bool			use_phy;
+	struct phy_setting	phy_setting;
 };
 
 struct cqspi_st {
@@ -108,12 +116,14 @@ struct cqspi_driver_platdata {
 /* Register map */
 #define CQSPI_REG_CONFIG			0x00
 #define CQSPI_REG_CONFIG_ENABLE_MASK		BIT(0)
+#define CQSPI_REG_CONFIG_PHY_EN			BIT(3)
 #define CQSPI_REG_CONFIG_ENB_DIR_ACC_CTRL	BIT(7)
 #define CQSPI_REG_CONFIG_DECODE_MASK		BIT(9)
 #define CQSPI_REG_CONFIG_CHIPSELECT_LSB		10
 #define CQSPI_REG_CONFIG_DMA_MASK		BIT(15)
 #define CQSPI_REG_CONFIG_BAUD_LSB		19
 #define CQSPI_REG_CONFIG_DTR_PROTO		BIT(24)
+#define CQSPI_REG_CONFIG_PHY_PIPELINE		BIT(25)
 #define CQSPI_REG_CONFIG_DUAL_OPCODE		BIT(30)
 #define CQSPI_REG_CONFIG_IDLE_LSB		31
 #define CQSPI_REG_CONFIG_CHIPSELECT_MASK	0xF
@@ -150,6 +160,7 @@ struct cqspi_driver_platdata {
 #define CQSPI_REG_READCAPTURE_BYPASS_LSB	0
 #define CQSPI_REG_READCAPTURE_DELAY_LSB		1
 #define CQSPI_REG_READCAPTURE_DELAY_MASK	0xF
+#define CQSPI_REG_READCAPTURE_DQS_LSB		8
 
 #define CQSPI_REG_SIZE				0x14
 #define CQSPI_REG_SIZE_ADDRESS_LSB		0
@@ -999,6 +1010,7 @@ static void cqspi_config_baudrate_div(struct cqspi_st *cqspi)
 
 static void cqspi_readdata_capture(struct cqspi_st *cqspi,
 				   const bool bypass,
+				   const bool dqs,
 				   const unsigned int delay)
 {
 	void __iomem *reg_base = cqspi->iobase;
@@ -1017,6 +1029,11 @@ static void cqspi_readdata_capture(struct cqspi_st *cqspi,
 	reg |= (delay & CQSPI_REG_READCAPTURE_DELAY_MASK)
 		<< CQSPI_REG_READCAPTURE_DELAY_LSB;
 
+	if (dqs)
+		reg |= (1 << CQSPI_REG_READCAPTURE_DQS_LSB);
+	else
+		reg &= ~(1 << CQSPI_REG_READCAPTURE_DQS_LSB);
+
 	writel(reg, reg_base + CQSPI_REG_READCAPTURE);
 }
 
@@ -1035,6 +1052,64 @@ static void cqspi_controller_enable(struct cqspi_st *cqspi, bool enable)
 	writel(reg, reg_base + CQSPI_REG_CONFIG);
 }
 
+static void cqspi_phy_enable(struct cqspi_flash_pdata *f_pdata, bool enable)
+{
+	struct cqspi_st *cqspi = f_pdata->cqspi;
+	void __iomem *reg_base = cqspi->iobase;
+	u32 reg;
+	u8 dummy;
+
+	if (enable) {
+		cqspi_readdata_capture(cqspi, 1, true,
+				       f_pdata->phy_setting.read_delay);
+
+		reg = readl(reg_base + CQSPI_REG_CONFIG);
+		reg |= CQSPI_REG_CONFIG_PHY_EN |
+		       CQSPI_REG_CONFIG_PHY_PIPELINE;
+		writel(reg, reg_base + CQSPI_REG_CONFIG);
+
+		/*
+		 * Reduce dummy cycle by 1. This is a requirement of PHY mode
+		 * operation for correctly reading the data.
+		 */
+		reg = readl(reg_base + CQSPI_REG_RD_INSTR);
+		dummy = (reg >> CQSPI_REG_RD_INSTR_DUMMY_LSB) &
+			CQSPI_REG_RD_INSTR_DUMMY_MASK;
+		dummy--;
+		reg &= ~(CQSPI_REG_RD_INSTR_DUMMY_MASK <<
+			 CQSPI_REG_RD_INSTR_DUMMY_LSB);
+
+		reg |= (dummy & CQSPI_REG_RD_INSTR_DUMMY_MASK)
+		       << CQSPI_REG_RD_INSTR_DUMMY_LSB;
+		writel(reg, reg_base + CQSPI_REG_RD_INSTR);
+	} else {
+		cqspi_readdata_capture(cqspi, !cqspi->rclk_en, false,
+				       f_pdata->read_delay);
+
+		reg = readl(reg_base + CQSPI_REG_CONFIG);
+		reg &= ~(CQSPI_REG_CONFIG_PHY_EN |
+			 CQSPI_REG_CONFIG_PHY_PIPELINE);
+		writel(reg, reg_base + CQSPI_REG_CONFIG);
+
+		/*
+		 * Dummy cycles were decremented when enabling PHY. Increment
+		 * dummy cycle by 1 to restore the original value.
+		 */
+		reg = readl(reg_base + CQSPI_REG_RD_INSTR);
+		dummy = (reg >> CQSPI_REG_RD_INSTR_DUMMY_LSB) &
+			CQSPI_REG_RD_INSTR_DUMMY_MASK;
+		dummy++;
+		reg &= ~(CQSPI_REG_RD_INSTR_DUMMY_MASK <<
+			 CQSPI_REG_RD_INSTR_DUMMY_LSB);
+
+		reg |= (dummy & CQSPI_REG_RD_INSTR_DUMMY_MASK)
+		       << CQSPI_REG_RD_INSTR_DUMMY_LSB;
+		writel(reg, reg_base + CQSPI_REG_RD_INSTR);
+	}
+
+	cqspi_wait_idle(cqspi);
+}
+
 static void cqspi_configure(struct cqspi_flash_pdata *f_pdata,
 			    unsigned long sclk)
 {
@@ -1056,7 +1131,7 @@ static void cqspi_configure(struct cqspi_flash_pdata *f_pdata,
 		cqspi->sclk = sclk;
 		cqspi_config_baudrate_div(cqspi);
 		cqspi_delay(f_pdata);
-		cqspi_readdata_capture(cqspi, !cqspi->rclk_en,
+		cqspi_readdata_capture(cqspi, !cqspi->rclk_en, false,
 				       f_pdata->read_delay);
 	}
 
@@ -1098,6 +1173,39 @@ static ssize_t cqspi_write(struct cqspi_flash_pdata *f_pdata,
 	return cqspi_indirect_write_execute(f_pdata, to, buf, len);
 }
 
+/*
+ * Check if PHY mode can be used on the given op. This is assuming it will be a
+ * DAC mode read, since PHY won't work on any other type of operation anyway.
+ */
+static bool cqspi_phy_op_eligible(const struct spi_mem_op *op)
+{
+	/* PHY is only tuned for 8D-8D-8D. */
+	if (!(op->cmd.dtr && op->addr.dtr && op->dummy.dtr && op->data.dtr))
+		return false;
+	if (op->cmd.buswidth != 8)
+		return false;
+	if (op->addr.nbytes && op->addr.buswidth != 8)
+		return false;
+	if (op->dummy.nbytes && op->dummy.buswidth != 8)
+		return false;
+	if (op->data.nbytes && op->data.buswidth != 8)
+		return false;
+
+	return true;
+}
+
+static bool cqspi_use_phy(struct cqspi_flash_pdata *f_pdata,
+			  const struct spi_mem_op *op)
+{
+	if (!f_pdata->use_phy)
+		return false;
+
+	if (op->data.nbytes < 16)
+		return false;
+
+	return cqspi_phy_op_eligible(op);
+}
+
 static void cqspi_rx_dma_callback(void *param)
 {
 	struct cqspi_st *cqspi = param;
@@ -1105,8 +1213,8 @@ static void cqspi_rx_dma_callback(void *param)
 	complete(&cqspi->rx_dma_complete);
 }
 
-static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
-				     u_char *buf, loff_t from, size_t len)
+static int cqspi_direct_read_dma(struct cqspi_flash_pdata *f_pdata,
+				 u_char *buf, loff_t from, size_t len)
 {
 	struct cqspi_st *cqspi = f_pdata->cqspi;
 	struct device *dev = &cqspi->pdev->dev;
@@ -1118,11 +1226,6 @@ static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
 	dma_addr_t dma_dst;
 	struct device *ddev;
 
-	if (!cqspi->rx_chan || !virt_addr_valid(buf)) {
-		memcpy_fromio(buf, cqspi->ahb_base + from, len);
-		return 0;
-	}
-
 	ddev = cqspi->rx_chan->device->dev;
 	dma_dst = dma_map_single(ddev, buf, len, DMA_FROM_DEVICE);
 	if (dma_mapping_error(ddev, dma_dst)) {
@@ -1164,6 +1267,64 @@ static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
 	return ret;
 }
 
+static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
+				     const struct spi_mem_op *op)
+{
+	struct cqspi_st *cqspi = f_pdata->cqspi;
+	loff_t from = op->addr.val;
+	loff_t from_aligned, to_aligned;
+	size_t len = op->data.nbytes;
+	size_t len_aligned;
+	u_char *buf = op->data.buf.in;
+	int ret;
+
+	if (!cqspi->rx_chan || !virt_addr_valid(buf)) {
+		memcpy_fromio(buf, cqspi->ahb_base + from, len);
+		return 0;
+	}
+
+	if (!cqspi_use_phy(f_pdata, op))
+		return cqspi_direct_read_dma(f_pdata, buf, from, len);
+
+	/*
+	 * PHY reads must be 16-byte aligned, and they must be a multiple of 16
+	 * bytes.
+	 */
+	from_aligned = (from + 0xF) & ~0xF;
+	to_aligned = (from + len) & ~0xF;
+	len_aligned = to_aligned - from_aligned;
+
+	/* Read the unaligned part at the start. */
+	if (from != from_aligned) {
+		ret = cqspi_direct_read_dma(f_pdata, buf, from,
+					    from_aligned - from);
+		if (ret)
+			return ret;
+		buf += from_aligned - from;
+	}
+
+	if (len_aligned) {
+		cqspi_phy_enable(f_pdata, true);
+		ret = cqspi_direct_read_dma(f_pdata, buf, from_aligned,
+					    len_aligned);
+		cqspi_phy_enable(f_pdata, false);
+		if (ret)
+			return ret;
+		buf += len_aligned;
+	}
+
+	/* Now read the remaining part, if any. */
+	if (to_aligned != (from + len)) {
+		ret = cqspi_direct_read_dma(f_pdata, buf, to_aligned,
+					    (from + len) - to_aligned);
+		if (ret)
+			return ret;
+		buf += (from + len) - to_aligned;
+	}
+
+	return 0;
+}
+
 static ssize_t cqspi_read(struct cqspi_flash_pdata *f_pdata,
 			  const struct spi_mem_op *op)
 {
@@ -1182,7 +1343,7 @@ static ssize_t cqspi_read(struct cqspi_flash_pdata *f_pdata,
 		return ret;
 
 	if (cqspi->use_direct_mode && ((from + len) <= cqspi->ahb_size))
-		return cqspi_direct_read_execute(f_pdata, buf, from, len);
+		return cqspi_direct_read_execute(f_pdata, op);
 
 	return cqspi_indirect_read_execute(f_pdata, buf, from, len);
 }
-- 
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 5/6] spi: cadence-qspi: Tune PHY to allow running at higher frequencies
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
                   ` (3 preceding siblings ...)
  2021-03-11 19:12 ` [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible Pratyush Yadav
@ 2021-03-11 19:12 ` Pratyush Yadav
  2021-04-29 22:48   ` Michael Walle
  2021-03-11 19:12 ` [RFC PATCH 6/6] arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration Pratyush Yadav
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-11 19:12 UTC (permalink / raw)
  To: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Michael Walle, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mark Brown, linux-arm-kernel, devicetree,
	linux-kernel, linux-mtd, linux-spi
  Cc: Pratyush Yadav, Lokesh Vutla

The controller can only run at 1/8 ref clock speed without PHY. With
PHY, it can run at the ref clock speed. So, to enable higher speed
operations, perform the PHY tuning algorithm and determine the RX, TX,
and read delay values for optimal performance. The details of the tuning
algorithm can be found at [0].

To allow this tuning to happen, pre-determined data must be programmed
to the flash at some location. This location is then advertised via a
nvmem cell. Without this data being available, the tuning would fail.

The tuning algorithm is a multi-variable search. The RX and TX delays
need to be found, along with the read delay that would work across a
temperature range. To do that, first the upper and lower RX values at
which the tuning pattern is readable are looked for. This is called the
passing region. The search is performed with Tx = 16 incrementing the
read delay with each iteration. If the two RX values have the same read
delay, the same search is performed with TX = 48.

Once the RX boundaries are found, the TX boundaries are searched for in
a similar fashion with RX set to 1/4 of the RX window (the difference
between the highest and lowest values). And similarly, if the TX
boundaries have the same read delay, the same search is performed with
RX set to 3/4 of the RX window.

There is a region around the boundary of the two passing regions. It is
called the failing region. PHY reads will not work in this region so the
PHY should be tuned as far from it as possible to allow for temperature
variations. This region is found using binary search where the window is
progressively narrowed down until it arrives at the final boundary's
lower and upper limits.

Once PHY is successfully tuned, mark it as usable to allow eligible
operations to run at high speeds. PHY can only be used with DAC mode
reads, and only in chunks of 16 bytes. For all other operations, PHY
mode should be turned off.

[0] https://www.ti.com/lit/pdf/spract2/

Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
---
 drivers/spi/spi-cadence-quadspi.c | 617 ++++++++++++++++++++++++++++++
 1 file changed, 617 insertions(+)

diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
index e64d8e125263..d304148a4722 100644
--- a/drivers/spi/spi-cadence-quadspi.c
+++ b/drivers/spi/spi-cadence-quadspi.c
@@ -28,6 +28,7 @@
 #include <linux/spi/spi.h>
 #include <linux/spi/spi-mem.h>
 #include <linux/timer.h>
+#include <linux/nvmem-consumer.h>
 
 #define CQSPI_NAME			"cadence-qspi"
 #define CQSPI_MAX_CHIPSELECT		16
@@ -62,6 +63,9 @@ struct cqspi_flash_pdata {
 	u8			cs;
 	bool			use_phy;
 	struct phy_setting	phy_setting;
+	struct spi_mem_op	phy_read_op;
+	u32			phy_tx_start;
+	u32			phy_tx_end;
 };
 
 struct cqspi_st {
@@ -237,6 +241,13 @@ struct cqspi_driver_platdata {
 #define CQSPI_REG_POLLING_STATUS		0xB0
 #define CQSPI_REG_POLLING_STATUS_DUMMY_LSB	16
 
+#define CQSPI_REG_PHY_CONFIG			0xB4
+#define CQSPI_REG_PHY_CONFIG_RX_DEL_LSB		0
+#define CQSPI_REG_PHY_CONFIG_RX_DEL_MASK	0x7F
+#define CQSPI_REG_PHY_CONFIG_TX_DEL_LSB		16
+#define CQSPI_REG_PHY_CONFIG_TX_DEL_MASK	0x7F
+#define CQSPI_REG_PHY_CONFIG_RESYNC		BIT(31)
+
 #define CQSPI_REG_OP_EXT_LOWER			0xE0
 #define CQSPI_REG_OP_EXT_READ_LSB		24
 #define CQSPI_REG_OP_EXT_WRITE_LSB		16
@@ -262,6 +273,570 @@ struct cqspi_driver_platdata {
 
 #define CQSPI_IRQ_STATUS_MASK		0x1FFFF
 
+#define CQSPI_PHY_INIT_RD		1
+#define CQSPI_PHY_MAX_RD		4
+#define CQSPI_PHY_MAX_RX		63
+#define CQSPI_PHY_MAX_TX		63
+#define CQSPI_PHY_LOW_RX_BOUND		15
+#define CQSPI_PHY_HIGH_RX_BOUND		25
+#define CQSPI_PHY_LOW_TX_BOUND		32
+#define CQSPI_PHY_HIGH_TX_BOUND		48
+#define CQSPI_PHY_TX_LOOKUP_LOW_BOUND	24
+#define CQSPI_PHY_TX_LOOKUP_HIGH_BOUND	38
+
+#define CQSPI_PHY_DEFAULT_TEMP		45
+#define CQSPI_PHY_MIN_TEMP		-45
+#define CQSPI_PHY_MAX_TEMP		130
+#define CQSPI_PHY_MID_TEMP		(CQSPI_PHY_MIN_TEMP +	\
+					 ((CQSPI_PHY_MAX_TEMP - CQSPI_PHY_MIN_TEMP) / 2))
+
+static const u8 phy_tuning_pattern[] = {
+0xFE, 0xFF, 0x01, 0x01, 0x01, 0x01, 0x01, 0x00, 0x00, 0xFE, 0xFE, 0x01, 0x01,
+0x01, 0x01, 0x00, 0x00, 0xFE, 0xFE, 0x01, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00,
+0x00, 0xFE, 0xFE, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0x00, 0xFE, 0xFE, 0xFF, 0x01,
+0x01, 0x01, 0x01, 0x01, 0xFE, 0x00, 0xFE, 0xFE, 0x01, 0x01, 0x01, 0x01, 0xFE,
+0x00, 0xFE, 0xFE, 0x01, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFE, 0x00, 0xFE, 0xFE,
+0xFF, 0xFF, 0xFF, 0xFF, 0xFE, 0x00, 0xFE, 0xFE, 0xFF, 0x01, 0x01, 0x01, 0x01,
+0x01, 0x00, 0xFE, 0xFE, 0xFE, 0x01, 0x01, 0x01, 0x01, 0x00, 0xFE, 0xFE, 0xFE,
+0x01, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0x00, 0xFE, 0xFE, 0xFE, 0xFF, 0xFF, 0xFF,
+0xFF, 0x00, 0xFE, 0xFE, 0xFE, 0xFF, 0x01, 0x01, 0x01, 0x01, 0x01, 0xFE, 0xFE,
+0xFE, 0xFE, 0x01, 0x01, 0x01, 0x01, 0xFE, 0xFE, 0xFE, 0xFE, 0x01,
+};
+
+static void cqspi_set_tx_dll(void __iomem *reg_base, u8 dll)
+{
+	unsigned int reg;
+
+	reg = readl(reg_base + CQSPI_REG_PHY_CONFIG);
+	reg &= ~(CQSPI_REG_PHY_CONFIG_TX_DEL_MASK <<
+		CQSPI_REG_PHY_CONFIG_TX_DEL_LSB);
+	reg |= (dll & CQSPI_REG_PHY_CONFIG_TX_DEL_MASK) <<
+		CQSPI_REG_PHY_CONFIG_TX_DEL_LSB;
+	reg |= CQSPI_REG_PHY_CONFIG_RESYNC;
+	writel(reg, reg_base + CQSPI_REG_PHY_CONFIG);
+}
+
+static void cqspi_set_rx_dll(void __iomem *reg_base, u8 dll)
+{
+	unsigned int reg;
+
+	reg = readl(reg_base + CQSPI_REG_PHY_CONFIG);
+	reg &= ~(CQSPI_REG_PHY_CONFIG_RX_DEL_MASK <<
+		CQSPI_REG_PHY_CONFIG_RX_DEL_LSB);
+	reg |= (dll & CQSPI_REG_PHY_CONFIG_RX_DEL_MASK) <<
+		CQSPI_REG_PHY_CONFIG_RX_DEL_LSB;
+	reg |= CQSPI_REG_PHY_CONFIG_RESYNC;
+	writel(reg, reg_base + CQSPI_REG_PHY_CONFIG);
+}
+
+/* TODO: Figure out how to get the temperature here. */
+static int cqspi_get_temp(int *temp)
+{
+	return -EOPNOTSUPP;
+}
+
+static void cqspi_phy_apply_setting(struct cqspi_flash_pdata *f_pdata,
+				    struct phy_setting *phy)
+{
+	struct cqspi_st *cqspi = f_pdata->cqspi;
+
+	cqspi_set_rx_dll(cqspi->iobase, phy->rx);
+	cqspi_set_tx_dll(cqspi->iobase, phy->tx);
+	f_pdata->phy_setting.read_delay = phy->read_delay;
+}
+
+static int cqspi_phy_check_pattern(struct cqspi_flash_pdata *f_pdata,
+				   struct nvmem_cell *cell)
+{
+	u8 *read_data;
+	size_t len;
+	int ret;
+
+	read_data = nvmem_cell_read(cell, &len);
+	if (IS_ERR(read_data))
+		return PTR_ERR(read_data);
+	if (len != ARRAY_SIZE(phy_tuning_pattern))
+		return -EIO;
+
+	if (memcmp(read_data, phy_tuning_pattern,
+		   ARRAY_SIZE(phy_tuning_pattern))) {
+		ret = -EAGAIN;
+		goto out;
+	}
+
+	ret = 0;
+
+out:
+	kfree(read_data);
+	return ret;
+}
+
+static int cqspi_find_rx_low(struct cqspi_flash_pdata *f_pdata,
+			     struct nvmem_cell *cell, struct phy_setting *phy)
+{
+	struct device *dev = &f_pdata->cqspi->pdev->dev;
+	int ret;
+
+	do {
+		phy->rx = 0;
+		do {
+			cqspi_phy_apply_setting(f_pdata, phy);
+			ret = cqspi_phy_check_pattern(f_pdata, cell);
+			if (!ret)
+				return 0;
+
+			phy->rx++;
+		} while (phy->rx <= CQSPI_PHY_LOW_RX_BOUND);
+
+		phy->read_delay++;
+	} while (phy->read_delay <= CQSPI_PHY_MAX_RD);
+
+	dev_dbg(dev, "Unable to find RX low\n");
+	return -ENOENT;
+}
+
+static int cqspi_find_rx_high(struct cqspi_flash_pdata *f_pdata,
+			      struct nvmem_cell *cell, struct phy_setting *phy)
+{
+	struct device *dev = &f_pdata->cqspi->pdev->dev;
+	int ret;
+
+	do {
+		phy->rx = CQSPI_PHY_MAX_RX;
+		do {
+			cqspi_phy_apply_setting(f_pdata, phy);
+			ret = cqspi_phy_check_pattern(f_pdata, cell);
+			if (!ret)
+				return 0;
+
+			phy->rx--;
+		} while (phy->rx >= CQSPI_PHY_HIGH_RX_BOUND);
+
+		phy->read_delay++;
+	} while (phy->read_delay <= CQSPI_PHY_MAX_RD);
+
+	dev_dbg(dev, "Unable to find RX high\n");
+	return -ENOENT;
+}
+
+static int cqspi_find_tx_low(struct cqspi_flash_pdata *f_pdata,
+			     struct nvmem_cell *cell, struct phy_setting *phy)
+{
+	struct device *dev = &f_pdata->cqspi->pdev->dev;
+	int ret;
+
+	do {
+		phy->tx = 0;
+		do {
+			cqspi_phy_apply_setting(f_pdata, phy);
+			ret = cqspi_phy_check_pattern(f_pdata, cell);
+			if (!ret)
+				return 0;
+
+			phy->tx++;
+		} while (phy->tx <= CQSPI_PHY_LOW_TX_BOUND);
+
+		phy->read_delay++;
+	} while (phy->read_delay <= CQSPI_PHY_MAX_RD);
+
+	dev_dbg(dev, "Unable to find TX low\n");
+	return -ENOENT;
+}
+
+static int cqspi_find_tx_high(struct cqspi_flash_pdata *f_pdata,
+			      struct nvmem_cell *cell, struct phy_setting *phy)
+{
+	struct device *dev = &f_pdata->cqspi->pdev->dev;
+	int ret;
+
+	do {
+		phy->tx = CQSPI_PHY_MAX_TX;
+		do {
+			cqspi_phy_apply_setting(f_pdata, phy);
+			ret = cqspi_phy_check_pattern(f_pdata, cell);
+			if (!ret)
+				return 0;
+
+			phy->tx--;
+		} while (phy->tx >= CQSPI_PHY_HIGH_TX_BOUND);
+
+		phy->read_delay++;
+	} while (phy->read_delay <= CQSPI_PHY_MAX_RD);
+
+	dev_dbg(dev, "Unable to find TX high\n");
+	return -ENOENT;
+}
+
+static int cqspi_phy_find_gaplow(struct cqspi_flash_pdata *f_pdata,
+				 struct nvmem_cell *cell,
+				 struct phy_setting *bottomleft,
+				 struct phy_setting *topright,
+				 struct phy_setting *gaplow)
+{
+	struct phy_setting left, right, mid;
+	int ret;
+
+	left = *bottomleft;
+	right = *topright;
+
+	mid.tx = left.tx + ((right.tx - left.tx) / 2);
+	mid.rx = left.rx + ((right.rx - left.rx) / 2);
+	mid.read_delay = left.read_delay;
+
+	do {
+		cqspi_phy_apply_setting(f_pdata, &mid);
+		ret = cqspi_phy_check_pattern(f_pdata, cell);
+		if (ret) {
+			/* The pattern was not found. Go to the lower half. */
+			right.tx = mid.tx;
+			right.rx = mid.rx;
+
+			mid.tx = left.tx + ((mid.tx - left.tx) / 2);
+			mid.rx = left.rx + ((mid.rx - left.rx) / 2);
+		} else {
+			/* The pattern was found. Go to the upper half. */
+			left.tx = mid.tx;
+			left.rx = mid.rx;
+
+			mid.tx = mid.tx + ((right.tx - mid.tx) / 2);
+			mid.rx = mid.rx + ((right.rx - mid.rx) / 2);
+		}
+
+	/* Break the loop if the window has closed. */
+	} while ((right.tx - left.tx >= 2) && (right.rx - left.rx >= 2));
+
+	*gaplow = mid;
+	return 0;
+}
+
+static int cqspi_phy_find_gaphigh(struct cqspi_flash_pdata *f_pdata,
+				  struct nvmem_cell *cell,
+				  struct phy_setting *bottomleft,
+				  struct phy_setting *topright,
+				  struct phy_setting *gaphigh)
+{
+	struct phy_setting left, right, mid;
+	int ret;
+
+	left = *bottomleft;
+	right = *topright;
+
+	mid.tx = left.tx + ((right.tx - left.tx) / 2);
+	mid.rx = left.rx + ((right.rx - left.rx) / 2);
+	mid.read_delay = right.read_delay;
+
+	do {
+		cqspi_phy_apply_setting(f_pdata, &mid);
+		ret = cqspi_phy_check_pattern(f_pdata, cell);
+		if (ret) {
+			/* The pattern was not found. Go to the upper half. */
+			left.tx = mid.tx;
+			left.rx = mid.rx;
+
+			mid.tx = mid.tx + ((right.tx - mid.tx) / 2);
+			mid.rx = mid.rx + ((right.rx - mid.rx) / 2);
+		} else {
+			/* The pattern was found. Go to the lower half. */
+			right.tx = mid.tx;
+			right.rx = mid.rx;
+
+			mid.tx = left.tx + ((mid.tx - left.tx) / 2);
+			mid.rx = left.rx + ((mid.rx - left.rx) / 2);
+		}
+
+	/* Break the loop if the window has closed. */
+	} while ((right.tx - left.tx >= 2) && (right.rx - left.rx >= 2));
+
+	*gaphigh = mid;
+	return 0;
+}
+
+static int cqspi_phy_calibrate(struct cqspi_flash_pdata *f_pdata,
+			       struct nvmem_cell *cell)
+{
+	struct cqspi_st *cqspi = f_pdata->cqspi;
+	struct device *dev = &cqspi->pdev->dev;
+	struct phy_setting rxlow, rxhigh, txlow, txhigh, temp;
+	struct phy_setting bottomleft, topright, searchpoint, gaplow, gaphigh;
+	int ret, tmp;
+
+	f_pdata->use_phy = true;
+
+	/* Look for RX boundaries at lower TX range. */
+	rxlow.tx = f_pdata->phy_tx_start;
+
+	do {
+		dev_dbg(dev, "Searching for rxlow on TX = %d\n", rxlow.tx);
+		rxlow.read_delay = CQSPI_PHY_INIT_RD;
+		ret = cqspi_find_rx_low(f_pdata, cell, &rxlow);
+	} while (ret && ++rxlow.tx <= CQSPI_PHY_TX_LOOKUP_LOW_BOUND);
+
+	if (ret)
+		goto out;
+	dev_dbg(dev, "rxlow: RX: %d TX: %d RD: %d\n", rxlow.rx, rxlow.tx,
+		rxlow.read_delay);
+
+	rxhigh.tx = rxlow.tx;
+	rxhigh.read_delay = rxlow.read_delay;
+	cqspi_find_rx_high(f_pdata, cell, &rxhigh);
+	if (ret)
+		goto out;
+	dev_dbg(dev, "rxhigh: RX: %d TX: %d RD: %d\n", rxhigh.rx, rxhigh.tx,
+		rxhigh.read_delay);
+
+	/*
+	 * Check a different point if rxlow and rxhigh are on the same read
+	 * delay. This avoids mistaking the failing region for an RX boundary.
+	 */
+	if (rxlow.read_delay == rxhigh.read_delay) {
+		dev_dbg(dev,
+			"rxlow and rxhigh at the same read delay.\n");
+
+		/* Look for RX boundaries at upper TX range. */
+		temp.tx = f_pdata->phy_tx_end;
+
+		do {
+			dev_dbg(dev, "Searching for rxlow on TX = %d\n",
+				temp.tx);
+			temp.read_delay = CQSPI_PHY_INIT_RD;
+			ret = cqspi_find_rx_low(f_pdata, cell, &temp);
+		} while (ret && --temp.tx >= CQSPI_PHY_TX_LOOKUP_HIGH_BOUND);
+
+		if (ret)
+			goto out;
+		dev_dbg(dev, "rxlow: RX: %d TX: %d RD: %d\n", temp.rx, temp.tx,
+			temp.read_delay);
+
+		if (temp.rx < rxlow.rx) {
+			rxlow = temp;
+			dev_dbg(dev, "Updating rxlow to the one at TX = 48\n");
+		}
+
+		/* Find RX max. */
+		ret = cqspi_find_rx_high(f_pdata, cell, &temp);
+		if (ret)
+			goto out;
+		dev_dbg(dev, "rxhigh: RX: %d TX: %d RD: %d\n", temp.rx, temp.tx,
+			temp.read_delay);
+
+		if (temp.rx < rxhigh.rx) {
+			rxhigh = temp;
+			dev_dbg(dev, "Updating rxhigh to the one at TX = 48\n");
+		}
+	}
+
+	/* Look for TX boundaries at 1/4 of RX window. */
+	txlow.rx = rxlow.rx + ((rxhigh.rx - rxlow.rx) / 4);
+	txhigh.rx = txlow.rx;
+
+	txlow.read_delay = CQSPI_PHY_INIT_RD;
+	ret = cqspi_find_tx_low(f_pdata, cell, &txlow);
+	if (ret)
+		goto out;
+	dev_dbg(dev, "txlow: RX: %d TX: %d RD: %d\n", txlow.rx, txlow.tx,
+		txlow.read_delay);
+
+	txhigh.read_delay = txlow.read_delay;
+	ret = cqspi_find_tx_high(f_pdata, cell, &txhigh);
+	if (ret)
+		goto out;
+	dev_dbg(dev, "txhigh: RX: %d TX: %d RD: %d\n", txhigh.rx, txhigh.tx,
+		txhigh.read_delay);
+
+	/*
+	 * Check a different point if txlow and txhigh are on the same read
+	 * delay. This avoids mistaking the failing region for an TX boundary.
+	 */
+	if (txlow.read_delay == txhigh.read_delay) {
+		/* Look for TX boundaries at 3/4 of RX window. */
+		temp.rx = rxlow.rx + (3 * (rxhigh.rx - rxlow.rx) / 4);
+		temp.read_delay = CQSPI_PHY_INIT_RD;
+		dev_dbg(dev,
+			"txlow and txhigh at the same read delay. Searching at RX = %d\n",
+			temp.rx);
+
+		ret = cqspi_find_tx_low(f_pdata, cell, &temp);
+		if (ret)
+			goto out;
+		dev_dbg(dev, "txlow: RX: %d TX: %d RD: %d\n", temp.rx, temp.tx,
+			temp.read_delay);
+
+		if (temp.tx < txlow.tx) {
+			txlow = temp;
+			dev_dbg(dev, "Updating txlow with the one at RX = %d\n",
+				txlow.rx);
+		}
+
+		ret = cqspi_find_tx_high(f_pdata, cell, &temp);
+		if (ret)
+			goto out;
+		dev_dbg(dev, "txhigh: RX: %d TX: %d RD: %d\n", temp.rx, temp.tx,
+			temp.read_delay);
+
+		if (temp.tx < txhigh.tx) {
+			txhigh = temp;
+			dev_dbg(dev, "Updating txhigh with the one at RX = %d\n",
+				txhigh.rx);
+		}
+	}
+
+	/*
+	 * Set bottom left and top right corners. These are theoretical
+	 * corners. They may not actually be "good" points. But the longest
+	 * diagonal will be between these corners.
+	 */
+	bottomleft.tx = txlow.tx;
+	bottomleft.rx = rxlow.rx;
+	if (txlow.read_delay <= rxlow.read_delay)
+		bottomleft.read_delay = txlow.read_delay;
+	else
+		bottomleft.read_delay = rxlow.read_delay;
+
+	temp = bottomleft;
+	temp.tx += 4;
+	temp.rx += 4;
+	cqspi_phy_apply_setting(f_pdata, &temp);
+	ret = cqspi_phy_check_pattern(f_pdata, cell);
+	if (ret) {
+		temp.read_delay--;
+		cqspi_phy_apply_setting(f_pdata, &temp);
+		ret = cqspi_phy_check_pattern(f_pdata, cell);
+	}
+
+	if (!ret)
+		bottomleft.read_delay = temp.read_delay;
+
+	topright.tx = txhigh.tx;
+	topright.rx = rxhigh.rx;
+	if (txhigh.read_delay >= rxhigh.read_delay)
+		topright.read_delay = txhigh.read_delay;
+	else
+		topright.read_delay = rxhigh.read_delay;
+
+	temp = topright;
+	temp.tx -= 4;
+	temp.rx -= 4;
+	cqspi_phy_apply_setting(f_pdata, &temp);
+	ret = cqspi_phy_check_pattern(f_pdata, cell);
+	if (ret) {
+		temp.read_delay++;
+		cqspi_phy_apply_setting(f_pdata, &temp);
+		ret = cqspi_phy_check_pattern(f_pdata, cell);
+	}
+
+	if (!ret)
+		topright.read_delay = temp.read_delay;
+
+	dev_dbg(dev, "topright: RX: %d TX: %d RD: %d\n", topright.rx,
+		topright.tx, topright.read_delay);
+	dev_dbg(dev, "bottomleft: RX: %d TX: %d RD: %d\n", bottomleft.rx,
+		bottomleft.tx, bottomleft.read_delay);
+
+	ret = cqspi_phy_find_gaplow(f_pdata, cell, &bottomleft, &topright,
+				    &gaplow);
+	if (ret)
+		goto out;
+	dev_dbg(dev, "gaplow: RX: %d TX: %d RD: %d\n", gaplow.rx, gaplow.tx,
+		gaplow.read_delay);
+
+	if (bottomleft.read_delay == topright.read_delay) {
+		/*
+		 * If there is only one passing region, it means that the "true"
+		 * topright is too small to find, so the start of the failing
+		 * region is a good approximation. Put the tuning point in the
+		 * middle and adjust for temperature.
+		 */
+		topright = gaplow;
+		searchpoint.read_delay = bottomleft.read_delay;
+		searchpoint.tx = bottomleft.tx +
+				 ((topright.tx - bottomleft.tx) / 2);
+		searchpoint.rx = bottomleft.rx +
+				 ((topright.rx - bottomleft.rx) / 2);
+
+		ret = cqspi_get_temp(&tmp);
+		if (ret) {
+			/*
+			 * Assume room temperature if it couldn't be obtained
+			 * from the thermal sensor.
+			 *
+			 * TODO: Change it to dev_warn once support for finding
+			 * out the temperature is added.
+			 */
+			dev_dbg(dev,
+				"Unable to get temperature. Assuming room temperature\n");
+			tmp = CQSPI_PHY_DEFAULT_TEMP;
+		}
+
+		if (tmp < CQSPI_PHY_MIN_TEMP || tmp > CQSPI_PHY_MAX_TEMP) {
+			dev_err(dev,
+				"Temperature outside operating range: %dC\n",
+				tmp);
+			ret = -EINVAL;
+			goto out;
+		}
+
+		/* Avoid a divide-by-zero. */
+		if (tmp == CQSPI_PHY_MID_TEMP)
+			tmp++;
+		dev_dbg(dev, "Temperature: %dC\n", tmp);
+
+		searchpoint.tx += (topright.tx - bottomleft.tx) /
+				  (330 / (tmp - CQSPI_PHY_MID_TEMP));
+		searchpoint.rx += (topright.rx - bottomleft.rx) /
+				  (330 / (tmp - CQSPI_PHY_MID_TEMP));
+	} else {
+		/*
+		 * If there are two passing regions, find the start and end of
+		 * the second one.
+		 */
+		ret = cqspi_phy_find_gaphigh(f_pdata, cell, &bottomleft,
+					     &topright, &gaphigh);
+		if (ret)
+			goto out;
+		dev_dbg(dev, "gaphigh: RX: %d TX: %d RD: %d\n", gaphigh.rx,
+			gaphigh.tx, gaphigh.read_delay);
+
+		/*
+		 * Place the final tuning point in the corner furthest from the
+		 * failing region but leave some margin for temperature changes.
+		 */
+		if ((abs(gaplow.tx - bottomleft.tx) +
+		     abs(gaplow.rx - bottomleft.rx)) <
+		    (abs(gaphigh.tx - topright.tx) +
+		     abs(gaphigh.rx - topright.rx))) {
+			searchpoint = topright;
+			searchpoint.tx -= 16;
+			searchpoint.rx -= (16 * (topright.rx - bottomleft.rx)) /
+					   (topright.tx - bottomleft.tx);
+		} else {
+			searchpoint = bottomleft;
+			searchpoint.tx += 16;
+			searchpoint.rx += (16 * (topright.rx - bottomleft.rx)) /
+					   (topright.tx - bottomleft.tx);
+		}
+	}
+
+	/* Set the final PHY settings and check if they are working. */
+	cqspi_phy_apply_setting(f_pdata, &searchpoint);
+	dev_dbg(dev, "Final tuning point: RX: %d TX: %d RD: %d\n",
+		searchpoint.rx, searchpoint.tx, searchpoint.read_delay);
+
+	ret = cqspi_phy_check_pattern(f_pdata, cell);
+	if (ret) {
+		dev_err(dev,
+			"Failed to find pattern at final calibration point\n");
+		ret = -EINVAL;
+		goto out;
+	}
+
+	ret = 0;
+	f_pdata->phy_setting.read_delay = searchpoint.read_delay;
+out:
+	if (ret)
+		f_pdata->use_phy = false;
+	return ret;
+}
+
 static int cqspi_wait_for_bit(void __iomem *reg, const u32 mask, bool clr)
 {
 	u32 val;
@@ -1400,6 +1975,41 @@ static bool cqspi_supports_mem_op(struct spi_mem *mem,
 		return spi_mem_default_supports_op(mem, op);
 }
 
+static void cqspi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op)
+{
+	struct cqspi_st *cqspi = spi_master_get_devdata(mem->spi->master);
+	struct cqspi_flash_pdata *f_pdata;
+	struct nvmem_cell *cell;
+	struct device *dev = &cqspi->pdev->dev;
+	int ret;
+
+	f_pdata = &cqspi->f_pdata[mem->spi->chip_select];
+
+	/* Check if the op is eligible for PHY mode operation. */
+	if (!cqspi_phy_op_eligible(op))
+		return;
+
+	cell = nvmem_cell_get(dev, "calibration");
+	if (IS_ERR_OR_NULL(cell)) {
+		dev_dbg(dev, "Failed to get calibration data nvmem: %ld\n",
+			PTR_ERR(cell));
+		return;
+	}
+
+	ret = cqspi_phy_check_pattern(f_pdata, cell);
+	if (ret) {
+		dev_dbg(dev, "Pattern not found. Skipping calibration.\n");
+		nvmem_cell_put(cell);
+		return;
+	}
+
+	ret = cqspi_phy_calibrate(f_pdata, cell);
+	if (ret)
+		dev_info(&cqspi->pdev->dev, "PHY calibration failed: %d\n", ret);
+
+	nvmem_cell_put(cell);
+}
+
 static int cqspi_of_get_flash_pdata(struct platform_device *pdev,
 				    struct cqspi_flash_pdata *f_pdata,
 				    struct device_node *np)
@@ -1434,6 +2044,12 @@ static int cqspi_of_get_flash_pdata(struct platform_device *pdev,
 		return -ENXIO;
 	}
 
+	if (of_property_read_u32(np, "cdns,phy-tx-start", &f_pdata->phy_tx_start))
+		f_pdata->phy_tx_start = 16;
+
+	if (of_property_read_u32(np, "cdns,phy-tx-end", &f_pdata->phy_tx_end))
+		f_pdata->phy_tx_end = 48;
+
 	return 0;
 }
 
@@ -1534,6 +2150,7 @@ static const struct spi_controller_mem_ops cqspi_mem_ops = {
 	.exec_op = cqspi_exec_mem_op,
 	.get_name = cqspi_get_name,
 	.supports_op = cqspi_supports_mem_op,
+	.do_calibration = cqspi_mem_do_calibration,
 };
 
 static int cqspi_setup_flash(struct cqspi_st *cqspi)
-- 
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [RFC PATCH 6/6] arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
                   ` (4 preceding siblings ...)
  2021-03-11 19:12 ` [RFC PATCH 5/6] spi: cadence-qspi: Tune PHY to allow running at higher frequencies Pratyush Yadav
@ 2021-03-11 19:12 ` Pratyush Yadav
  2021-03-12  9:09 ` [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Tudor.Ambarus
  2021-03-12 13:32 ` Michael Walle
  7 siblings, 0 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-11 19:12 UTC (permalink / raw)
  To: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Michael Walle, Miquel Raynal, Richard Weinberger,
	Vignesh Raghavendra, Mark Brown, linux-arm-kernel, devicetree,
	linux-kernel, linux-mtd, linux-spi
  Cc: Pratyush Yadav, Lokesh Vutla

For running the flash in 8D-8D-8D mode at 166 MHz (controller ref clock
speed), PHY calibration procedure needs to run to calculate the proper
line delays.

To perform this calibration, the controller needs to know the location
of the pre-determined calibration pattern on the flash. Add a fixed
partition table that contains the calibration partition, along with the
rest of the partitions for the platform. Also add a nvmem cell that
points to the calibration partition.

Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
---

Based on patch
https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210305153926.3479-2-p.yadav@ti.com/

 arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi | 55 +++++++++++++++++++++
 1 file changed, 55 insertions(+)

diff --git a/arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi b/arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi
index 2fee2906183d..6013ebb45517 100644
--- a/arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi
+++ b/arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi
@@ -170,6 +170,8 @@ J721E_WKUP_IOPAD(0x002c, PIN_OUTPUT, 0) /* MCU_OSPI0_CSn0 */
 &ospi0 {
 	pinctrl-names = "default";
 	pinctrl-0 = <&mcu_fss0_ospi0_pins_default>;
+	nvmem-cells = <&ospi_calibration_data>;
+	nvmem-cell-names = "calibration";

 	flash@0{
 		compatible = "jedec,spi-nor";
@@ -184,6 +186,59 @@ flash@0{
 		cdns,read-delay = <0>;
 		#address-cells = <1>;
 		#size-cells = <1>;
+
+		partitions {
+			compatible = "fixed-partitions";
+			#address-cells = <1>;
+			#size-cells = <1>;
+
+			partiiton@0 {
+				label = "ospi.tiboot3";
+				reg = <0x0 0x80000>;
+			};
+
+			partition@80000 {
+				label = "ospi.tispl";
+				reg = <0x80000 0x200000>;
+			};
+
+			partition@280000 {
+				label = "ospi.u-boot";
+				reg = <0x280000 0x400000>;
+			};
+
+			partition@680000 {
+				label = "ospi.env";
+				reg = <0x680000 0x20000>;
+			};
+
+			partition@6a0000 {
+				label = "ospi.env.backup";
+				reg = <0x6a0000 0x20000>;
+			};
+
+			partition@0x6c0000 {
+				label = "ospi.sysfw";
+				reg = <0x6c0000 0x100000>;
+			};
+
+			partition@800000 {
+				label = "ospi.rootfs";
+				reg = <0x800000 0x37e0000>;
+			};
+
+			calibration_partition: partition@3fe0000 {
+				compatible = "nvmem-cells";
+				label = "ospi.phypattern";
+				reg = <0x3fe0000 0x20000>;
+				#address-cells = <1>;
+				#size-cells = <1>;
+
+				ospi_calibration_data: ospi-calibration-cell@0 {
+					reg = <0x0 0x80>;
+				};
+			};
+		};
 	};
 };

--
2.30.0


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
                   ` (5 preceding siblings ...)
  2021-03-11 19:12 ` [RFC PATCH 6/6] arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration Pratyush Yadav
@ 2021-03-12  9:09 ` Tudor.Ambarus
  2021-03-12 10:10   ` Pratyush Yadav
  2021-03-12 13:32 ` Michael Walle
  7 siblings, 1 reply; 33+ messages in thread
From: Tudor.Ambarus @ 2021-03-12  9:09 UTC (permalink / raw)
  To: p.yadav, nm, kristo, robh+dt, michael, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi
  Cc: lokeshvutla

On 3/11/21 9:12 PM, Pratyush Yadav wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Hi,
> 
> This series adds support for OSPI PHY calibration on the Cadence OSPI
> controller. This calibration procedure is needed to allow high clock
> speeds in 8D-8D-8D mode. The procedure reads some pre-determined pattern
> data from the flash and runs a sequence of test reads to find out the
> optimal delays for high speed transfer. More details on the calibration
> procedure in patch 5/6.

Can the calibration sequence be avoided if the controller is informed
about the frequency on which the flash operates?

Can you add more details about the optimal delays? Are we talking about
flash's AC characteristics? Is the calibration still needed if the upper
layer informs the QSPI controller about the needed delays?

Cheers,
ta

> 
> The main problem here is telling the controller where to find the
> pattern and how to read it. This RFC uses nvmem cells which point to a
> fixed partition containing the data to do the reads. It depends on [0]
> and [1].
> 
> The obvious problem with this is it won't work when the partitions are
> defined via command line. I don't see any good way to add nvmem cells to
> command line partitions. I would like some help or ideas here. We don't
> necessarily have to use nvmem either. Any way that can cleanly and
> consistently let the controller find out where the pattern is stored is
> good.
> 
> The dts patch depends on [2].
> 
> Tested on TI's J721E EVM.
> 
> [0] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210302190012.1255-1-zajec5@gmail.com/
> [1] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210308011853.19360-1-ansuelsmth@gmail.com/
> [2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210305153926.3479-2-p.yadav@ti.com/
> 
> Pratyush Yadav (6):
>   spi: spi-mem: Tell controller when device is ready for calibration
>   mtd: spi-nor: core: consolidate read op creation
>   mtd: spi-nor: core: run calibration when initialization is done
>   spi: cadence-qspi: Use PHY for DAC reads if possible
>   spi: cadence-qspi: Tune PHY to allow running at higher frequencies
>   arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration
> 
>  arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi |  55 ++
>  drivers/mtd/spi-nor/core.c                  |  74 +-
>  drivers/spi/spi-cadence-quadspi.c           | 820 +++++++++++++++++++-
>  drivers/spi/spi-mem.c                       |  12 +
>  include/linux/spi/spi-mem.h                 |   8 +
>  5 files changed, 916 insertions(+), 53 deletions(-)
> 
> --
> 2.30.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible
  2021-03-11 19:12 ` [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible Pratyush Yadav
@ 2021-03-12  9:13   ` Tudor.Ambarus
  2021-03-12 10:17     ` Pratyush Yadav
  0 siblings, 1 reply; 33+ messages in thread
From: Tudor.Ambarus @ 2021-03-12  9:13 UTC (permalink / raw)
  To: p.yadav, nm, kristo, robh+dt, michael, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi
  Cc: lokeshvutla

On 3/11/21 9:12 PM, Pratyush Yadav wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Check if a read is eligible for PHY and if it is, enable PHY and DQS.

DQS as in data strobe? Shouldn't the upper layer inform the QSPI controller
whether DS is required or not?

> 
> Since PHY reads only work at an address that is 16-byte aligned and of
> size that is a multiple of 16 bytes, read the starting and ending
> unaligned portions without PHY, and only enable PHY for the middle part.
> 
> Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
> ---
>  drivers/spi/spi-cadence-quadspi.c | 203 ++++++++++++++++++++++++++----
>  1 file changed, 182 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/spi/spi-cadence-quadspi.c b/drivers/spi/spi-cadence-quadspi.c
> index e2d6ea833423..e64d8e125263 100644
> --- a/drivers/spi/spi-cadence-quadspi.c
> +++ b/drivers/spi/spi-cadence-quadspi.c
> @@ -41,19 +41,27 @@
> 
>  struct cqspi_st;
> 
> +struct phy_setting {
> +       u8              rx;
> +       u8              tx;
> +       u8              read_delay;
> +};
> +
>  struct cqspi_flash_pdata {
> -       struct cqspi_st *cqspi;
> -       u32             clk_rate;
> -       u32             read_delay;
> -       u32             tshsl_ns;
> -       u32             tsd2d_ns;
> -       u32             tchsh_ns;
> -       u32             tslch_ns;
> -       u8              inst_width;
> -       u8              addr_width;
> -       u8              data_width;
> -       bool            dtr;
> -       u8              cs;
> +       struct cqspi_st         *cqspi;
> +       u32                     clk_rate;
> +       u32                     read_delay;
> +       u32                     tshsl_ns;
> +       u32                     tsd2d_ns;
> +       u32                     tchsh_ns;
> +       u32                     tslch_ns;
> +       u8                      inst_width;
> +       u8                      addr_width;
> +       u8                      data_width;
> +       bool                    dtr;
> +       u8                      cs;
> +       bool                    use_phy;
> +       struct phy_setting      phy_setting;
>  };
> 
>  struct cqspi_st {
> @@ -108,12 +116,14 @@ struct cqspi_driver_platdata {
>  /* Register map */
>  #define CQSPI_REG_CONFIG                       0x00
>  #define CQSPI_REG_CONFIG_ENABLE_MASK           BIT(0)
> +#define CQSPI_REG_CONFIG_PHY_EN                        BIT(3)
>  #define CQSPI_REG_CONFIG_ENB_DIR_ACC_CTRL      BIT(7)
>  #define CQSPI_REG_CONFIG_DECODE_MASK           BIT(9)
>  #define CQSPI_REG_CONFIG_CHIPSELECT_LSB                10
>  #define CQSPI_REG_CONFIG_DMA_MASK              BIT(15)
>  #define CQSPI_REG_CONFIG_BAUD_LSB              19
>  #define CQSPI_REG_CONFIG_DTR_PROTO             BIT(24)
> +#define CQSPI_REG_CONFIG_PHY_PIPELINE          BIT(25)
>  #define CQSPI_REG_CONFIG_DUAL_OPCODE           BIT(30)
>  #define CQSPI_REG_CONFIG_IDLE_LSB              31
>  #define CQSPI_REG_CONFIG_CHIPSELECT_MASK       0xF
> @@ -150,6 +160,7 @@ struct cqspi_driver_platdata {
>  #define CQSPI_REG_READCAPTURE_BYPASS_LSB       0
>  #define CQSPI_REG_READCAPTURE_DELAY_LSB                1
>  #define CQSPI_REG_READCAPTURE_DELAY_MASK       0xF
> +#define CQSPI_REG_READCAPTURE_DQS_LSB          8
> 
>  #define CQSPI_REG_SIZE                         0x14
>  #define CQSPI_REG_SIZE_ADDRESS_LSB             0
> @@ -999,6 +1010,7 @@ static void cqspi_config_baudrate_div(struct cqspi_st *cqspi)
> 
>  static void cqspi_readdata_capture(struct cqspi_st *cqspi,
>                                    const bool bypass,
> +                                  const bool dqs,
>                                    const unsigned int delay)
>  {
>         void __iomem *reg_base = cqspi->iobase;
> @@ -1017,6 +1029,11 @@ static void cqspi_readdata_capture(struct cqspi_st *cqspi,
>         reg |= (delay & CQSPI_REG_READCAPTURE_DELAY_MASK)
>                 << CQSPI_REG_READCAPTURE_DELAY_LSB;
> 
> +       if (dqs)
> +               reg |= (1 << CQSPI_REG_READCAPTURE_DQS_LSB);
> +       else
> +               reg &= ~(1 << CQSPI_REG_READCAPTURE_DQS_LSB);
> +
>         writel(reg, reg_base + CQSPI_REG_READCAPTURE);
>  }
> 
> @@ -1035,6 +1052,64 @@ static void cqspi_controller_enable(struct cqspi_st *cqspi, bool enable)
>         writel(reg, reg_base + CQSPI_REG_CONFIG);
>  }
> 
> +static void cqspi_phy_enable(struct cqspi_flash_pdata *f_pdata, bool enable)
> +{
> +       struct cqspi_st *cqspi = f_pdata->cqspi;
> +       void __iomem *reg_base = cqspi->iobase;
> +       u32 reg;
> +       u8 dummy;
> +
> +       if (enable) {
> +               cqspi_readdata_capture(cqspi, 1, true,
> +                                      f_pdata->phy_setting.read_delay);
> +
> +               reg = readl(reg_base + CQSPI_REG_CONFIG);
> +               reg |= CQSPI_REG_CONFIG_PHY_EN |
> +                      CQSPI_REG_CONFIG_PHY_PIPELINE;
> +               writel(reg, reg_base + CQSPI_REG_CONFIG);
> +
> +               /*
> +                * Reduce dummy cycle by 1. This is a requirement of PHY mode
> +                * operation for correctly reading the data.
> +                */
> +               reg = readl(reg_base + CQSPI_REG_RD_INSTR);
> +               dummy = (reg >> CQSPI_REG_RD_INSTR_DUMMY_LSB) &
> +                       CQSPI_REG_RD_INSTR_DUMMY_MASK;
> +               dummy--;
> +               reg &= ~(CQSPI_REG_RD_INSTR_DUMMY_MASK <<
> +                        CQSPI_REG_RD_INSTR_DUMMY_LSB);
> +
> +               reg |= (dummy & CQSPI_REG_RD_INSTR_DUMMY_MASK)
> +                      << CQSPI_REG_RD_INSTR_DUMMY_LSB;
> +               writel(reg, reg_base + CQSPI_REG_RD_INSTR);
> +       } else {
> +               cqspi_readdata_capture(cqspi, !cqspi->rclk_en, false,
> +                                      f_pdata->read_delay);
> +
> +               reg = readl(reg_base + CQSPI_REG_CONFIG);
> +               reg &= ~(CQSPI_REG_CONFIG_PHY_EN |
> +                        CQSPI_REG_CONFIG_PHY_PIPELINE);
> +               writel(reg, reg_base + CQSPI_REG_CONFIG);
> +
> +               /*
> +                * Dummy cycles were decremented when enabling PHY. Increment
> +                * dummy cycle by 1 to restore the original value.
> +                */
> +               reg = readl(reg_base + CQSPI_REG_RD_INSTR);
> +               dummy = (reg >> CQSPI_REG_RD_INSTR_DUMMY_LSB) &
> +                       CQSPI_REG_RD_INSTR_DUMMY_MASK;
> +               dummy++;
> +               reg &= ~(CQSPI_REG_RD_INSTR_DUMMY_MASK <<
> +                        CQSPI_REG_RD_INSTR_DUMMY_LSB);
> +
> +               reg |= (dummy & CQSPI_REG_RD_INSTR_DUMMY_MASK)
> +                      << CQSPI_REG_RD_INSTR_DUMMY_LSB;
> +               writel(reg, reg_base + CQSPI_REG_RD_INSTR);
> +       }
> +
> +       cqspi_wait_idle(cqspi);
> +}
> +
>  static void cqspi_configure(struct cqspi_flash_pdata *f_pdata,
>                             unsigned long sclk)
>  {
> @@ -1056,7 +1131,7 @@ static void cqspi_configure(struct cqspi_flash_pdata *f_pdata,
>                 cqspi->sclk = sclk;
>                 cqspi_config_baudrate_div(cqspi);
>                 cqspi_delay(f_pdata);
> -               cqspi_readdata_capture(cqspi, !cqspi->rclk_en,
> +               cqspi_readdata_capture(cqspi, !cqspi->rclk_en, false,
>                                        f_pdata->read_delay);
>         }
> 
> @@ -1098,6 +1173,39 @@ static ssize_t cqspi_write(struct cqspi_flash_pdata *f_pdata,
>         return cqspi_indirect_write_execute(f_pdata, to, buf, len);
>  }
> 
> +/*
> + * Check if PHY mode can be used on the given op. This is assuming it will be a
> + * DAC mode read, since PHY won't work on any other type of operation anyway.
> + */
> +static bool cqspi_phy_op_eligible(const struct spi_mem_op *op)
> +{
> +       /* PHY is only tuned for 8D-8D-8D. */
> +       if (!(op->cmd.dtr && op->addr.dtr && op->dummy.dtr && op->data.dtr))
> +               return false;
> +       if (op->cmd.buswidth != 8)
> +               return false;
> +       if (op->addr.nbytes && op->addr.buswidth != 8)
> +               return false;
> +       if (op->dummy.nbytes && op->dummy.buswidth != 8)
> +               return false;
> +       if (op->data.nbytes && op->data.buswidth != 8)
> +               return false;
> +
> +       return true;
> +}
> +
> +static bool cqspi_use_phy(struct cqspi_flash_pdata *f_pdata,
> +                         const struct spi_mem_op *op)
> +{
> +       if (!f_pdata->use_phy)
> +               return false;
> +
> +       if (op->data.nbytes < 16)
> +               return false;
> +
> +       return cqspi_phy_op_eligible(op);
> +}
> +
>  static void cqspi_rx_dma_callback(void *param)
>  {
>         struct cqspi_st *cqspi = param;
> @@ -1105,8 +1213,8 @@ static void cqspi_rx_dma_callback(void *param)
>         complete(&cqspi->rx_dma_complete);
>  }
> 
> -static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
> -                                    u_char *buf, loff_t from, size_t len)
> +static int cqspi_direct_read_dma(struct cqspi_flash_pdata *f_pdata,
> +                                u_char *buf, loff_t from, size_t len)
>  {
>         struct cqspi_st *cqspi = f_pdata->cqspi;
>         struct device *dev = &cqspi->pdev->dev;
> @@ -1118,11 +1226,6 @@ static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
>         dma_addr_t dma_dst;
>         struct device *ddev;
> 
> -       if (!cqspi->rx_chan || !virt_addr_valid(buf)) {
> -               memcpy_fromio(buf, cqspi->ahb_base + from, len);
> -               return 0;
> -       }
> -
>         ddev = cqspi->rx_chan->device->dev;
>         dma_dst = dma_map_single(ddev, buf, len, DMA_FROM_DEVICE);
>         if (dma_mapping_error(ddev, dma_dst)) {
> @@ -1164,6 +1267,64 @@ static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
>         return ret;
>  }
> 
> +static int cqspi_direct_read_execute(struct cqspi_flash_pdata *f_pdata,
> +                                    const struct spi_mem_op *op)
> +{
> +       struct cqspi_st *cqspi = f_pdata->cqspi;
> +       loff_t from = op->addr.val;
> +       loff_t from_aligned, to_aligned;
> +       size_t len = op->data.nbytes;
> +       size_t len_aligned;
> +       u_char *buf = op->data.buf.in;
> +       int ret;
> +
> +       if (!cqspi->rx_chan || !virt_addr_valid(buf)) {
> +               memcpy_fromio(buf, cqspi->ahb_base + from, len);
> +               return 0;
> +       }
> +
> +       if (!cqspi_use_phy(f_pdata, op))
> +               return cqspi_direct_read_dma(f_pdata, buf, from, len);
> +
> +       /*
> +        * PHY reads must be 16-byte aligned, and they must be a multiple of 16
> +        * bytes.
> +        */
> +       from_aligned = (from + 0xF) & ~0xF;
> +       to_aligned = (from + len) & ~0xF;
> +       len_aligned = to_aligned - from_aligned;
> +
> +       /* Read the unaligned part at the start. */
> +       if (from != from_aligned) {
> +               ret = cqspi_direct_read_dma(f_pdata, buf, from,
> +                                           from_aligned - from);
> +               if (ret)
> +                       return ret;
> +               buf += from_aligned - from;
> +       }
> +
> +       if (len_aligned) {
> +               cqspi_phy_enable(f_pdata, true);
> +               ret = cqspi_direct_read_dma(f_pdata, buf, from_aligned,
> +                                           len_aligned);
> +               cqspi_phy_enable(f_pdata, false);
> +               if (ret)
> +                       return ret;
> +               buf += len_aligned;
> +       }
> +
> +       /* Now read the remaining part, if any. */
> +       if (to_aligned != (from + len)) {
> +               ret = cqspi_direct_read_dma(f_pdata, buf, to_aligned,
> +                                           (from + len) - to_aligned);
> +               if (ret)
> +                       return ret;
> +               buf += (from + len) - to_aligned;
> +       }
> +
> +       return 0;
> +}
> +
>  static ssize_t cqspi_read(struct cqspi_flash_pdata *f_pdata,
>                           const struct spi_mem_op *op)
>  {
> @@ -1182,7 +1343,7 @@ static ssize_t cqspi_read(struct cqspi_flash_pdata *f_pdata,
>                 return ret;
> 
>         if (cqspi->use_direct_mode && ((from + len) <= cqspi->ahb_size))
> -               return cqspi_direct_read_execute(f_pdata, buf, from, len);
> +               return cqspi_direct_read_execute(f_pdata, op);
> 
>         return cqspi_indirect_read_execute(f_pdata, buf, from, len);
>  }
> --
> 2.30.0
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12  9:09 ` [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Tudor.Ambarus
@ 2021-03-12 10:10   ` Pratyush Yadav
  2021-03-12 10:20     ` Michael Walle
  2021-03-12 11:23     ` Tudor.Ambarus
  0 siblings, 2 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-12 10:10 UTC (permalink / raw)
  To: Tudor.Ambarus
  Cc: nm, kristo, robh+dt, michael, miquel.raynal, richard, vigneshr,
	broonie, linux-arm-kernel, devicetree, linux-kernel, linux-mtd,
	linux-spi, lokeshvutla

On 12/03/21 09:09AM, Tudor.Ambarus@microchip.com wrote:
> On 3/11/21 9:12 PM, Pratyush Yadav wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > Hi,
> > 
> > This series adds support for OSPI PHY calibration on the Cadence OSPI
> > controller. This calibration procedure is needed to allow high clock
> > speeds in 8D-8D-8D mode. The procedure reads some pre-determined pattern
> > data from the flash and runs a sequence of test reads to find out the
> > optimal delays for high speed transfer. More details on the calibration
> > procedure in patch 5/6.
> 
> Can the calibration sequence be avoided if the controller is informed
> about the frequency on which the flash operates?

Maybe I don't understand this correctly, but there should not be any 
frequency on which the flash operates. The controller drives the SPI 
clock so the frequency is decided by the controller. Sure, there is a 
max supported frequency for the flash but the controller can run it 
slower than that if it wishes. The flash has no say in that.

Anyway, the exact frequency at which the flash is running is not it is 
looking for. More details below.

> 
> Can you add more details about the optimal delays? Are we talking about
> flash's AC characteristics? Is the calibration still needed if the upper
> layer informs the QSPI controller about the needed delays?

There is usually a delay from when the flash drives the data line (IOW, 
puts a data bit on it) and when the signal reaches the controller. This 
delay can vary by the flash, board, silicon characteristics, 
temperature, etc.

At lower speeds (25 MHz for example) this delay is not a problem because 
the clock period is longer so there is much more time to sample the data 
line. It is very likely the controller will sample at a time when the 
data line is valid. At high speeds (166 MHz for example), especially in 
DDR mode, this delay starts to play a larger role because the time to 
sample the data line is much smaller. Now unless the delay is accounted 
for, it is possible that the controller samples the data line too late 
or too early and sees invalid data.

These delays depend on physical characteristics so it is not possible 
for any upper layer to inform the controller about it. How will they 
even know what the required delay is?

In summary, no, there is no way an upper layer can inform the controller 
about this delay.

> 
> Cheers,
> ta
> 
> > 
> > The main problem here is telling the controller where to find the
> > pattern and how to read it. This RFC uses nvmem cells which point to a
> > fixed partition containing the data to do the reads. It depends on [0]
> > and [1].
> > 
> > The obvious problem with this is it won't work when the partitions are
> > defined via command line. I don't see any good way to add nvmem cells to
> > command line partitions. I would like some help or ideas here. We don't
> > necessarily have to use nvmem either. Any way that can cleanly and
> > consistently let the controller find out where the pattern is stored is
> > good.
> > 
> > The dts patch depends on [2].
> > 
> > Tested on TI's J721E EVM.
> > 
> > [0] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210302190012.1255-1-zajec5@gmail.com/
> > [1] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210308011853.19360-1-ansuelsmth@gmail.com/
> > [2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210305153926.3479-2-p.yadav@ti.com/
> > 
> > Pratyush Yadav (6):
> >   spi: spi-mem: Tell controller when device is ready for calibration
> >   mtd: spi-nor: core: consolidate read op creation
> >   mtd: spi-nor: core: run calibration when initialization is done
> >   spi: cadence-qspi: Use PHY for DAC reads if possible
> >   spi: cadence-qspi: Tune PHY to allow running at higher frequencies
> >   arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration
> > 
> >  arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi |  55 ++
> >  drivers/mtd/spi-nor/core.c                  |  74 +-
> >  drivers/spi/spi-cadence-quadspi.c           | 820 +++++++++++++++++++-
> >  drivers/spi/spi-mem.c                       |  12 +
> >  include/linux/spi/spi-mem.h                 |   8 +
> >  5 files changed, 916 insertions(+), 53 deletions(-)
> > 
> > --
> > 2.30.0
> > 
> > 
> > _______________________________________________
> > linux-arm-kernel mailing list
> > linux-arm-kernel@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> > 
> 

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible
  2021-03-12  9:13   ` Tudor.Ambarus
@ 2021-03-12 10:17     ` Pratyush Yadav
  2021-04-29 16:28       ` Michael Walle
  0 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-12 10:17 UTC (permalink / raw)
  To: Tudor.Ambarus
  Cc: nm, kristo, robh+dt, michael, miquel.raynal, richard, vigneshr,
	broonie, linux-arm-kernel, devicetree, linux-kernel, linux-mtd,
	linux-spi, lokeshvutla

On 12/03/21 09:13AM, Tudor.Ambarus@microchip.com wrote:
> On 3/11/21 9:12 PM, Pratyush Yadav wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > Check if a read is eligible for PHY and if it is, enable PHY and DQS.
> 
> DQS as in data strobe? Shouldn't the upper layer inform the QSPI controller
> whether DS is required or not?

Yes, DQS as in data strobe. I need to check this again, but IIRC the 
controller cannot run in PHY mode unless DS is used. Ideally the upper 
layer should indeed inform the controller whether DS is supported/in-use 
or not. That can be used to decide whether PHY mode (and consequently 
the DS line) is to be used or not.

Currently there are only two flashes that use 8D-8D-8D mode (S28HS512T 
and MT35XU512ABA), and both of them drive the DS line.

> 
> > 
> > Since PHY reads only work at an address that is 16-byte aligned and of
> > size that is a multiple of 16 bytes, read the starting and ending
> > unaligned portions without PHY, and only enable PHY for the middle part.
> > 
> > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
> > ---
> >  drivers/spi/spi-cadence-quadspi.c | 203 ++++++++++++++++++++++++++----
> >  1 file changed, 182 insertions(+), 21 deletions(-)
> > 

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12 10:10   ` Pratyush Yadav
@ 2021-03-12 10:20     ` Michael Walle
  2021-03-12 11:07       ` Pratyush Yadav
  2021-03-12 11:23     ` Tudor.Ambarus
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Walle @ 2021-03-12 10:20 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Tudor.Ambarus, nm, kristo, robh+dt, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, lokeshvutla

Am 2021-03-12 11:10, schrieb Pratyush Yadav:
> There is usually a delay from when the flash drives the data line (IOW,
> puts a data bit on it) and when the signal reaches the controller. This
> delay can vary by the flash, board, silicon characteristics,
> temperature, etc.

Temperature might change over time, but the calibration is only done
once. I don't know how much influence the temperature actually has, but
our boards are usually operating from -40°C to +85°C. So there might be
a possible temperature difference of 125K between actual calibration and
when the flash is accessed.

-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12 10:20     ` Michael Walle
@ 2021-03-12 11:07       ` Pratyush Yadav
  2021-03-12 13:26         ` Michael Walle
  0 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-12 11:07 UTC (permalink / raw)
  To: Michael Walle
  Cc: Tudor.Ambarus, nm, kristo, robh+dt, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, lokeshvutla

On 12/03/21 11:20AM, Michael Walle wrote:
> Am 2021-03-12 11:10, schrieb Pratyush Yadav:
> > There is usually a delay from when the flash drives the data line (IOW,
> > puts a data bit on it) and when the signal reaches the controller. This
> > delay can vary by the flash, board, silicon characteristics,
> > temperature, etc.
> 
> Temperature might change over time, but the calibration is only done
> once. I don't know how much influence the temperature actually has, but
> our boards are usually operating from -40°C to +85°C. So there might be
> a possible temperature difference of 125K between actual calibration and
> when the flash is accessed.

The algorithm supports a temperature range of -45 C to +130 C. The 
temperature is checked at calibration time and adjustments are made to 
make sure the reads work over the entire temperature range [0].

[0] The current implementation does not have a way to query the 
temperature from the sensor (see cqspi_get_temp()), so it always assumes 
the temperature at calibration time is 45 C. But that can be added later 
once the temperature sensor driver is implemented.

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12 10:10   ` Pratyush Yadav
  2021-03-12 10:20     ` Michael Walle
@ 2021-03-12 11:23     ` Tudor.Ambarus
  2021-03-12 18:14       ` Pratyush Yadav
  1 sibling, 1 reply; 33+ messages in thread
From: Tudor.Ambarus @ 2021-03-12 11:23 UTC (permalink / raw)
  To: p.yadav
  Cc: nm, kristo, robh+dt, michael, miquel.raynal, richard, vigneshr,
	broonie, linux-arm-kernel, devicetree, linux-kernel, linux-mtd,
	linux-spi, lokeshvutla

On 3/12/21 12:10 PM, Pratyush Yadav wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> On 12/03/21 09:09AM, Tudor.Ambarus@microchip.com wrote:
>> On 3/11/21 9:12 PM, Pratyush Yadav wrote:
>>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>>>
>>> Hi,
>>>
>>> This series adds support for OSPI PHY calibration on the Cadence OSPI
>>> controller. This calibration procedure is needed to allow high clock
>>> speeds in 8D-8D-8D mode. The procedure reads some pre-determined pattern
>>> data from the flash and runs a sequence of test reads to find out the
>>> optimal delays for high speed transfer. More details on the calibration
>>> procedure in patch 5/6.
>>
>> Can the calibration sequence be avoided if the controller is informed
>> about the frequency on which the flash operates?

s/frequency/maximum supported frequency by the flash

> 
> Maybe I don't understand this correctly, but there should not be any
> frequency on which the flash operates. The controller drives the SPI
> clock so the frequency is decided by the controller. Sure, there is a
> max supported frequency for the flash but the controller can run it
> slower than that if it wishes. The flash has no say in that.
> 
> Anyway, the exact frequency at which the flash is running is not it is
> looking for. More details below.

I thought about choosing at the controller side:
min(max_frequency_controller, max_frequency_flash)

And there is also the need of changing the frequency on which an op
runs, like the READ SFDP cmd, for which it is recommended to be run at
50 MHz, but maybe this is another topic, let's see.

> 
>>
>> Can you add more details about the optimal delays? Are we talking about
>> flash's AC characteristics? Is the calibration still needed if the upper
>> layer informs the QSPI controller about the needed delays?
> 
> There is usually a delay from when the flash drives the data line (IOW,
> puts a data bit on it) and when the signal reaches the controller. This
> delay can vary by the flash, board, silicon characteristics,
> temperature, etc.

I wonder whether the delay advertised by the flash matters the most, while
all the other are negligible.
When I talk about delay, I'm thinking for example at the delay required
between two consecutive transfers without removing the chip select, or about
the minimum delay needed between the activation or the deactivation of the
chip select. These are all described by the flash. Does your controller have
such fields in its registers, to set such delays? If yes, is the calibration 
sequence still needed if all the delays are set correctly?

When I hear about "board delays", I think about the impedance of the lines,
which should correspond to the impedance of the Flash's IOs (which depends on
the frequency on which the flash runs). A mechanism to choose the best
frequency and impedance level can be added.

Flashes have an interval of temperature on which they are guaranteed to
work (I would expect in the same conditions). Information about temperature
ranges and associated delays (if measured?) can be passed too.

Cheers,
ta
> 
> At lower speeds (25 MHz for example) this delay is not a problem because
> the clock period is longer so there is much more time to sample the data
> line. It is very likely the controller will sample at a time when the
> data line is valid. At high speeds (166 MHz for example), especially in
> DDR mode, this delay starts to play a larger role because the time to
> sample the data line is much smaller. Now unless the delay is accounted
> for, it is possible that the controller samples the data line too late
> or too early and sees invalid data.
> 
> These delays depend on physical characteristics so it is not possible
> for any upper layer to inform the controller about it. How will they
> even know what the required delay is?
> 
> In summary, no, there is no way an upper layer can inform the controller
> about this delay.
> 
>>
>> Cheers,
>> ta
>>
>>>
>>> The main problem here is telling the controller where to find the
>>> pattern and how to read it. This RFC uses nvmem cells which point to a
>>> fixed partition containing the data to do the reads. It depends on [0]
>>> and [1].
>>>
>>> The obvious problem with this is it won't work when the partitions are
>>> defined via command line. I don't see any good way to add nvmem cells to
>>> command line partitions. I would like some help or ideas here. We don't
>>> necessarily have to use nvmem either. Any way that can cleanly and
>>> consistently let the controller find out where the pattern is stored is
>>> good.
>>>
>>> The dts patch depends on [2].
>>>
>>> Tested on TI's J721E EVM.
>>>
>>> [0] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210302190012.1255-1-zajec5@gmail.com/
>>> [1] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210308011853.19360-1-ansuelsmth@gmail.com/
>>> [2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210305153926.3479-2-p.yadav@ti.com/
>>>
>>> Pratyush Yadav (6):
>>>   spi: spi-mem: Tell controller when device is ready for calibration
>>>   mtd: spi-nor: core: consolidate read op creation
>>>   mtd: spi-nor: core: run calibration when initialization is done
>>>   spi: cadence-qspi: Use PHY for DAC reads if possible
>>>   spi: cadence-qspi: Tune PHY to allow running at higher frequencies
>>>   arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration
>>>
>>>  arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi |  55 ++
>>>  drivers/mtd/spi-nor/core.c                  |  74 +-
>>>  drivers/spi/spi-cadence-quadspi.c           | 820 +++++++++++++++++++-
>>>  drivers/spi/spi-mem.c                       |  12 +
>>>  include/linux/spi/spi-mem.h                 |   8 +
>>>  5 files changed, 916 insertions(+), 53 deletions(-)
>>>
>>> --
>>> 2.30.0
>>>
>>>
>>> _______________________________________________
>>> linux-arm-kernel mailing list
>>> linux-arm-kernel@lists.infradead.org
>>> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>>>
>>
> 
> --
> Regards,
> Pratyush Yadav
> Texas Instruments Inc.
> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12 11:07       ` Pratyush Yadav
@ 2021-03-12 13:26         ` Michael Walle
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Walle @ 2021-03-12 13:26 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Tudor.Ambarus, nm, kristo, robh+dt, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, lokeshvutla

Am 2021-03-12 12:07, schrieb Pratyush Yadav:
> On 12/03/21 11:20AM, Michael Walle wrote:
>> Am 2021-03-12 11:10, schrieb Pratyush Yadav:
>> > There is usually a delay from when the flash drives the data line (IOW,
>> > puts a data bit on it) and when the signal reaches the controller. This
>> > delay can vary by the flash, board, silicon characteristics,
>> > temperature, etc.
>> 
>> Temperature might change over time, but the calibration is only done
>> once. I don't know how much influence the temperature actually has, 
>> but
>> our boards are usually operating from -40°C to +85°C. So there might 
>> be
>> a possible temperature difference of 125K between actual calibration 
>> and
>> when the flash is accessed.
> 
> The algorithm supports a temperature range of -45 C to +130 C. The
> temperature is checked at calibration time and adjustments are made to
> make sure the reads work over the entire temperature range [0].

Ah, nice. And you need the current temperature to correlate it to the
meassured timings, right?

-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
                   ` (6 preceding siblings ...)
  2021-03-12  9:09 ` [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Tudor.Ambarus
@ 2021-03-12 13:32 ` Michael Walle
  2021-03-12 14:59   ` Tudor.Ambarus
  2021-03-12 17:00   ` Pratyush Yadav
  7 siblings, 2 replies; 33+ messages in thread
From: Michael Walle @ 2021-03-12 13:32 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> The main problem here is telling the controller where to find the
> pattern and how to read it. This RFC uses nvmem cells which point to a
> fixed partition containing the data to do the reads. It depends on [0]
> and [1].
> 
> The obvious problem with this is it won't work when the partitions are
> defined via command line. I don't see any good way to add nvmem cells 
> to
> command line partitions. I would like some help or ideas here. We don't
> necessarily have to use nvmem either. Any way that can cleanly and
> consistently let the controller find out where the pattern is stored is
> good.

The NXP LS1028A SoC has a similar calibration (although there its done
in hardware it seems) and there the datasheet mentions there are flash
devices which supports a preamble before a read function. The preamble
is then some kind of learning pattern. Did you see a flash which 
actually
supports that in the wild? I can't find any publicly available 
datasheets
of 8bit I/O SPI NOR flashes.

-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12 13:32 ` Michael Walle
@ 2021-03-12 14:59   ` Tudor.Ambarus
  2021-03-12 17:00   ` Pratyush Yadav
  1 sibling, 0 replies; 33+ messages in thread
From: Tudor.Ambarus @ 2021-03-12 14:59 UTC (permalink / raw)
  To: michael, p.yadav
  Cc: nm, kristo, robh+dt, miquel.raynal, richard, vigneshr, broonie,
	linux-arm-kernel, devicetree, linux-kernel, linux-mtd, linux-spi,
	lokeshvutla

On 3/12/21 3:32 PM, Michael Walle wrote:
> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> 
> Am 2021-03-11 20:12, schrieb Pratyush Yadav:
>> The main problem here is telling the controller where to find the
>> pattern and how to read it. This RFC uses nvmem cells which point to a
>> fixed partition containing the data to do the reads. It depends on [0]
>> and [1].
>>
>> The obvious problem with this is it won't work when the partitions are
>> defined via command line. I don't see any good way to add nvmem cells
>> to
>> command line partitions. I would like some help or ideas here. We don't
>> necessarily have to use nvmem either. Any way that can cleanly and
>> consistently let the controller find out where the pattern is stored is
>> good.
> 
> The NXP LS1028A SoC has a similar calibration (although there its done
> in hardware it seems) and there the datasheet mentions there are flash
> devices which supports a preamble before a read function. The preamble
> is then some kind of learning pattern. Did you see a flash which
> actually
> supports that in the wild? I can't find any publicly available

MX66LM1G45G is an example.

> datasheets> of 8bit I/O SPI NOR flashes

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12 13:32 ` Michael Walle
  2021-03-12 14:59   ` Tudor.Ambarus
@ 2021-03-12 17:00   ` Pratyush Yadav
  1 sibling, 0 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-12 17:00 UTC (permalink / raw)
  To: Michael Walle
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

On 12/03/21 02:32PM, Michael Walle wrote:
> Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> > The main problem here is telling the controller where to find the
> > pattern and how to read it. This RFC uses nvmem cells which point to a
> > fixed partition containing the data to do the reads. It depends on [0]
> > and [1].
> > 
> > The obvious problem with this is it won't work when the partitions are
> > defined via command line. I don't see any good way to add nvmem cells to
> > command line partitions. I would like some help or ideas here. We don't
> > necessarily have to use nvmem either. Any way that can cleanly and
> > consistently let the controller find out where the pattern is stored is
> > good.
> 
> The NXP LS1028A SoC has a similar calibration (although there its done
> in hardware it seems) and there the datasheet mentions there are flash
> devices which supports a preamble before a read function. The preamble
> is then some kind of learning pattern. Did you see a flash which actually
> supports that in the wild? I can't find any publicly available datasheets
> of 8bit I/O SPI NOR flashes.

I haven't seen any such flash but it looks like Tudor has.

> 
> -michael

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi
  2021-03-12 11:23     ` Tudor.Ambarus
@ 2021-03-12 18:14       ` Pratyush Yadav
  0 siblings, 0 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-12 18:14 UTC (permalink / raw)
  To: Tudor.Ambarus
  Cc: nm, kristo, robh+dt, michael, miquel.raynal, richard, vigneshr,
	broonie, linux-arm-kernel, devicetree, linux-kernel, linux-mtd,
	linux-spi, lokeshvutla

On 12/03/21 11:23AM, Tudor.Ambarus@microchip.com wrote:
> On 3/12/21 12:10 PM, Pratyush Yadav wrote:
> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > 
> > On 12/03/21 09:09AM, Tudor.Ambarus@microchip.com wrote:
> >> On 3/11/21 9:12 PM, Pratyush Yadav wrote:
> >>> EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> >>>
> >>> Hi,
> >>>
> >>> This series adds support for OSPI PHY calibration on the Cadence OSPI
> >>> controller. This calibration procedure is needed to allow high clock
> >>> speeds in 8D-8D-8D mode. The procedure reads some pre-determined pattern
> >>> data from the flash and runs a sequence of test reads to find out the
> >>> optimal delays for high speed transfer. More details on the calibration
> >>> procedure in patch 5/6.
> >>
> >> Can the calibration sequence be avoided if the controller is informed
> >> about the frequency on which the flash operates?
> 
> s/frequency/maximum supported frequency by the flash

Again, the max frequency does not matter. If it is fast enough PHY 
calibration is needed to make sure lines are sampled at the correct 
time.

> 
> > 
> > Maybe I don't understand this correctly, but there should not be any
> > frequency on which the flash operates. The controller drives the SPI
> > clock so the frequency is decided by the controller. Sure, there is a
> > max supported frequency for the flash but the controller can run it
> > slower than that if it wishes. The flash has no say in that.
> > 
> > Anyway, the exact frequency at which the flash is running is not it is
> > looking for. More details below.
> 
> I thought about choosing at the controller side:
> min(max_frequency_controller, max_frequency_flash)
> 
> And there is also the need of changing the frequency on which an op
> runs, like the READ SFDP cmd, for which it is recommended to be run at
> 50 MHz, but maybe this is another topic, let's see.

Right. This is not directly related to the need for having the 
calibration.

Right now calibration is run only after the flash is fully initialized, 
so there should not be any SFDP commands from that point on. But if we 
do want to be conservative about it, a field can be added in spi_mem_op 
that mentions the maximum speed for the op so the controller can decide 
accordingly.

> 
> > 
> >>
> >> Can you add more details about the optimal delays? Are we talking about
> >> flash's AC characteristics? Is the calibration still needed if the upper
> >> layer informs the QSPI controller about the needed delays?
> > 
> > There is usually a delay from when the flash drives the data line (IOW,
> > puts a data bit on it) and when the signal reaches the controller. This
> > delay can vary by the flash, board, silicon characteristics,
> > temperature, etc.
> 
> I wonder whether the delay advertised by the flash matters the most, while
> all the other are negligible.

The delay advertised by the flash does matter. Specifically, the clock 
to data output delay. But the IO delay in the host (the delay from the 
pin to the internal FIFO) matters equally as much. Both are accounted 
for by the tuning.

> When I talk about delay, I'm thinking for example at the delay required
> between two consecutive transfers without removing the chip select, or about
> the minimum delay needed between the activation or the deactivation of the
> chip select. These are all described by the flash. Does your controller have
> such fields in its registers, to set such delays? If yes, is the calibration 
> sequence still needed if all the delays are set correctly?

The CS related delays are indeed accounted for by a register in the 
controller. But this is not the delay the calibration is concerned with. 
The delays the calibration is concerned with are the clock edge to data 
transition delay (TX delay), and the DS edge to data transition delay 
(RX delay). The two are totally unrelated.

> 
> When I hear about "board delays", I think about the impedance of the lines,
> which should correspond to the impedance of the Flash's IOs (which depends on
> the frequency on which the flash runs). A mechanism to choose the best
> frequency and impedance level can be added.

Board delays in this case are caused by the length of the wires/lines. 
Even if the lines are perfectly impedance matched to the flash's IOs, 
there will be a small time delay from when the data signal is launched 
by the flash and when it is received by the device. This causes a small 
but noticeable difference in the timing and consequently the final 
calibration values. For example, this is observed when comparing J721E 
EVM (evaluation module) and SVB (silicon validation board) platforms.

> 
> Flashes have an interval of temperature on which they are guaranteed to
> work (I would expect in the same conditions). Information about temperature
> ranges and associated delays (if measured?) can be passed too.

This would not be sufficient for placing TX and RX delays unless we have 
a perfect model for both the flash and the host device IO delays. Such a 
model would have to account for variations in timing caused by 
variations in the manufacturing process, voltage, and temperature. This 
is not practically feasible.

> 
> Cheers,
> ta
> > 
> > At lower speeds (25 MHz for example) this delay is not a problem because
> > the clock period is longer so there is much more time to sample the data
> > line. It is very likely the controller will sample at a time when the
> > data line is valid. At high speeds (166 MHz for example), especially in
> > DDR mode, this delay starts to play a larger role because the time to
> > sample the data line is much smaller. Now unless the delay is accounted
> > for, it is possible that the controller samples the data line too late
> > or too early and sees invalid data.
> > 
> > These delays depend on physical characteristics so it is not possible
> > for any upper layer to inform the controller about it. How will they
> > even know what the required delay is?
> > 
> > In summary, no, there is no way an upper layer can inform the controller
> > about this delay.
> > 
> >>
> >> Cheers,
> >> ta
> >>
> >>>
> >>> The main problem here is telling the controller where to find the
> >>> pattern and how to read it. This RFC uses nvmem cells which point to a
> >>> fixed partition containing the data to do the reads. It depends on [0]
> >>> and [1].
> >>>
> >>> The obvious problem with this is it won't work when the partitions are
> >>> defined via command line. I don't see any good way to add nvmem cells to
> >>> command line partitions. I would like some help or ideas here. We don't
> >>> necessarily have to use nvmem either. Any way that can cleanly and
> >>> consistently let the controller find out where the pattern is stored is
> >>> good.
> >>>
> >>> The dts patch depends on [2].
> >>>
> >>> Tested on TI's J721E EVM.
> >>>
> >>> [0] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210302190012.1255-1-zajec5@gmail.com/
> >>> [1] https://patchwork.ozlabs.org/project/linux-mtd/patch/20210308011853.19360-1-ansuelsmth@gmail.com/
> >>> [2] https://patchwork.kernel.org/project/linux-arm-kernel/patch/20210305153926.3479-2-p.yadav@ti.com/
> >>>
> >>> Pratyush Yadav (6):
> >>>   spi: spi-mem: Tell controller when device is ready for calibration
> >>>   mtd: spi-nor: core: consolidate read op creation
> >>>   mtd: spi-nor: core: run calibration when initialization is done
> >>>   spi: cadence-qspi: Use PHY for DAC reads if possible
> >>>   spi: cadence-qspi: Tune PHY to allow running at higher frequencies
> >>>   arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration
> >>>
> >>>  arch/arm64/boot/dts/ti/k3-j721e-som-p0.dtsi |  55 ++
> >>>  drivers/mtd/spi-nor/core.c                  |  74 +-
> >>>  drivers/spi/spi-cadence-quadspi.c           | 820 +++++++++++++++++++-
> >>>  drivers/spi/spi-mem.c                       |  12 +
> >>>  include/linux/spi/spi-mem.h                 |   8 +
> >>>  5 files changed, 916 insertions(+), 53 deletions(-)
> >>>

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration
  2021-03-11 19:12 ` [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration Pratyush Yadav
@ 2021-03-23 23:07   ` Michael Walle
  2021-03-24  8:08     ` Pratyush Yadav
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Walle @ 2021-03-23 23:07 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

Hi Pratyush,

Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> Some controllers like the Cadence OSPI controller need to perform a
> calibration sequence to operate at high clock speeds. This calibration
> should happen after the flash is fully initialized otherwise the
> calibration might happen in a different SPI mode from the one the flash
> is finally set to. Add a hook that can be used to tell the controller
> when the flash is ready for calibration. Whether calibration is needed
> depends on the controller.
> 
> Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
> ---
>  drivers/spi/spi-mem.c       | 12 ++++++++++++
>  include/linux/spi/spi-mem.h |  8 ++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> index dc713b0c3c4d..e2f05ad3f4dc 100644
> --- a/drivers/spi/spi-mem.c
> +++ b/drivers/spi/spi-mem.c
> @@ -464,6 +464,18 @@ int spi_mem_adjust_op_size(struct spi_mem *mem,
> struct spi_mem_op *op)
>  }
>  EXPORT_SYMBOL_GPL(spi_mem_adjust_op_size);
> 
> +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op)
> +{
> +	struct spi_controller *ctlr = mem->spi->controller;
> +
> +	if (!ctlr->mem_ops || !ctlr->mem_ops->do_calibration)
> +		return -EOPNOTSUPP;
> +
> +	ctlr->mem_ops->do_calibration(mem, op);

Can't a calibration fail?

> +	return 0;
> +}
> +EXPORT_SYMBOL_GPL(spi_mem_do_calibration);
> +
>  static ssize_t spi_mem_no_dirmap_read(struct spi_mem_dirmap_desc 
> *desc,
>  				      u64 offs, size_t len, void *buf)
>  {
> diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h
> index 2b65c9edc34e..97a2d280f2d0 100644
> --- a/include/linux/spi/spi-mem.h
> +++ b/include/linux/spi/spi-mem.h
> @@ -250,6 +250,12 @@ static inline void *spi_mem_get_drvdata(struct
> spi_mem *mem)
>   *		  the currently mapped area), and the caller of
>   *		  spi_mem_dirmap_write() is responsible for calling it again in
>   *		  this case.
> + * @do_calibration: perform calibration needed for high SPI clock 
> speed
> + *		    operation. Should be called after the SPI memory device has
> + *		    been completely initialized. The op passed should contain
> + *		    a template for the read operation used for the device so
> + *		    the controller can decide what type of calibration is
> + *		    required for this type of read.
>   *
>   * This interface should be implemented by SPI controllers providing 
> an
>   * high-level interface to execute SPI memory operation, which is 
> usually the
> @@ -274,6 +280,7 @@ struct spi_controller_mem_ops {
>  			       u64 offs, size_t len, void *buf);
>  	ssize_t (*dirmap_write)(struct spi_mem_dirmap_desc *desc,
>  				u64 offs, size_t len, const void *buf);
> +	void (*do_calibration)(struct spi_mem *mem, struct spi_mem_op *op);
>  };
> 
>  /**
> @@ -346,6 +353,7 @@ bool spi_mem_dtr_supports_op(struct spi_mem *mem,
>  #endif /* CONFIG_SPI_MEM */
> 
>  int spi_mem_adjust_op_size(struct spi_mem *mem, struct spi_mem_op 
> *op);
> +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op 
> *op);
> 
>  bool spi_mem_supports_op(struct spi_mem *mem,
>  			 const struct spi_mem_op *op);

-- 
-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation
  2021-03-11 19:12 ` [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation Pratyush Yadav
@ 2021-03-23 23:17   ` Michael Walle
  2021-03-24  8:04     ` Pratyush Yadav
  2021-04-08 12:48   ` Michael Walle
  1 sibling, 1 reply; 33+ messages in thread
From: Michael Walle @ 2021-03-23 23:17 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> Currently the spi_mem_op to read from the flash is used in two places:
> spi_nor_create_read_dirmap() and spi_nor_spimem_read_data(). In a later
> commit this number will increase to three. Instead of repeating the 
> same
> code thrice, add a function that returns a template of the read op. The
> callers can then fill in the details like address, data length, data
> buffer location.
> 
> Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
> ---
>  drivers/mtd/spi-nor/core.c | 62 ++++++++++++++++++++------------------
>  1 file changed, 32 insertions(+), 30 deletions(-)
> 
> diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
> index 4a315cb1c4db..88888df009f0 100644
> --- a/drivers/mtd/spi-nor/core.c
> +++ b/drivers/mtd/spi-nor/core.c
> @@ -183,6 +183,33 @@ static int spi_nor_controller_ops_erase(struct
> spi_nor *nor, loff_t offs)
>  	return nor->controller_ops->erase(nor, offs);
>  }
> 
> +/**
> + * spi_nor_spimem_get_read_op() - return a template for the spi_mem_op 
> used for
> + *                                reading data from the flash via 
> spi-mem.
> + * @nor:        pointer to 'struct spi_nor'
> + *
> + * Return: A template of the 'struct spi_mem_op' for used for reading 
> data from
> + * the flash. The caller is expected to fill in the address, data 
> length, and
> + * the data buffer.
> + */
> +static struct spi_mem_op spi_nor_spimem_get_read_op(struct spi_nor 
> *nor)
> +{
> +	struct spi_mem_op op =
> +		SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
> +			   SPI_MEM_OP_ADDR(nor->addr_width, 0, 0),
> +			   SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
> +			   SPI_MEM_OP_DATA_IN(1, NULL, 0));
> +
> +	spi_nor_spimem_setup_op(nor, &op, nor->read_proto);
> +
> +	/* convert the dummy cycles to the number of bytes */
> +	op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> +	if (spi_nor_protocol_is_dtr(nor->read_proto))
> +		op.dummy.nbytes *= 2;
> +
> +	return op;
> +}
> +
>  /**
>   * spi_nor_spimem_read_data() - read data from flash's memory region 
> via
>   *                              spi-mem
> @@ -196,21 +223,14 @@ static int spi_nor_controller_ops_erase(struct
> spi_nor *nor, loff_t offs)
>  static ssize_t spi_nor_spimem_read_data(struct spi_nor *nor, loff_t 
> from,
>  					size_t len, u8 *buf)
>  {
> -	struct spi_mem_op op =
> -		SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
> -			   SPI_MEM_OP_ADDR(nor->addr_width, from, 0),
> -			   SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
> -			   SPI_MEM_OP_DATA_IN(len, buf, 0));
> +	struct spi_mem_op op = spi_nor_spimem_get_read_op(nor);
>  	bool usebouncebuf;
>  	ssize_t nbytes;
>  	int error;
> 
> -	spi_nor_spimem_setup_op(nor, &op, nor->read_proto);
> -
> -	/* convert the dummy cycles to the number of bytes */
> -	op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> -	if (spi_nor_protocol_is_dtr(nor->read_proto))
> -		op.dummy.nbytes *= 2;
> +	op.addr.val = from;
> +	op.data.nbytes = len;
> +	op.data.buf.in = buf;
> 
>  	usebouncebuf = spi_nor_spimem_bounce(nor, &op);
> 
> @@ -3581,28 +3601,10 @@ EXPORT_SYMBOL_GPL(spi_nor_scan);
>  static int spi_nor_create_read_dirmap(struct spi_nor *nor)
>  {
>  	struct spi_mem_dirmap_info info = {
> -		.op_tmpl = SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
> -				      SPI_MEM_OP_ADDR(nor->addr_width, 0, 0),
> -				      SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
> -				      SPI_MEM_OP_DATA_IN(0, NULL, 0)),
> +		.op_tmpl = spi_nor_spimem_get_read_op(nor),
>  		.offset = 0,
>  		.length = nor->mtd.size,
>  	};
> -	struct spi_mem_op *op = &info.op_tmpl;
> -
> -	spi_nor_spimem_setup_op(nor, op, nor->read_proto);
> -
> -	/* convert the dummy cycles to the number of bytes */
> -	op->dummy.nbytes = (nor->read_dummy * op->dummy.buswidth) / 8;
> -	if (spi_nor_protocol_is_dtr(nor->read_proto))
> -		op->dummy.nbytes *= 2;
> -
> -	/*
> -	 * Since spi_nor_spimem_setup_op() only sets buswidth when the number
> -	 * of data bytes is non-zero, the data buswidth won't be set here. 
> So,
> -	 * do it explicitly.
> -	 */
> -	op->data.buswidth = spi_nor_get_protocol_data_nbits(nor->read_proto);

I guess this isn't needed anymore because spi_nor_spimem_get_read_op() 
uses a
data length of 1, right?

> 
>  	nor->dirmap.rdesc = devm_spi_mem_dirmap_create(nor->dev, nor->spimem,
>  						       &info);

-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation
  2021-03-23 23:17   ` Michael Walle
@ 2021-03-24  8:04     ` Pratyush Yadav
  0 siblings, 0 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-24  8:04 UTC (permalink / raw)
  To: Michael Walle
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

On 24/03/21 12:17AM, Michael Walle wrote:
> Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> > Currently the spi_mem_op to read from the flash is used in two places:
> > spi_nor_create_read_dirmap() and spi_nor_spimem_read_data(). In a later
> > commit this number will increase to three. Instead of repeating the same
> > code thrice, add a function that returns a template of the read op. The
> > callers can then fill in the details like address, data length, data
> > buffer location.
> > 
> > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
> > ---
> >  drivers/mtd/spi-nor/core.c | 62 ++++++++++++++++++++------------------
> >  1 file changed, 32 insertions(+), 30 deletions(-)
> > 
> > diff --git a/drivers/mtd/spi-nor/core.c b/drivers/mtd/spi-nor/core.c
> > index 4a315cb1c4db..88888df009f0 100644
> > --- a/drivers/mtd/spi-nor/core.c
> > +++ b/drivers/mtd/spi-nor/core.c
> > @@ -183,6 +183,33 @@ static int spi_nor_controller_ops_erase(struct
> > spi_nor *nor, loff_t offs)
> >  	return nor->controller_ops->erase(nor, offs);
> >  }
> > 
> > +/**
> > + * spi_nor_spimem_get_read_op() - return a template for the spi_mem_op
> > used for
> > + *                                reading data from the flash via
> > spi-mem.
> > + * @nor:        pointer to 'struct spi_nor'
> > + *
> > + * Return: A template of the 'struct spi_mem_op' for used for reading
> > data from
> > + * the flash. The caller is expected to fill in the address, data
> > length, and
> > + * the data buffer.
> > + */
> > +static struct spi_mem_op spi_nor_spimem_get_read_op(struct spi_nor
> > *nor)
> > +{
> > +	struct spi_mem_op op =
> > +		SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
> > +			   SPI_MEM_OP_ADDR(nor->addr_width, 0, 0),
> > +			   SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
> > +			   SPI_MEM_OP_DATA_IN(1, NULL, 0));
> > +
> > +	spi_nor_spimem_setup_op(nor, &op, nor->read_proto);
> > +
> > +	/* convert the dummy cycles to the number of bytes */
> > +	op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > +	if (spi_nor_protocol_is_dtr(nor->read_proto))
> > +		op.dummy.nbytes *= 2;
> > +
> > +	return op;
> > +}
> > +
> >  /**
> >   * spi_nor_spimem_read_data() - read data from flash's memory region
> > via
> >   *                              spi-mem
> > @@ -196,21 +223,14 @@ static int spi_nor_controller_ops_erase(struct
> > spi_nor *nor, loff_t offs)
> >  static ssize_t spi_nor_spimem_read_data(struct spi_nor *nor, loff_t
> > from,
> >  					size_t len, u8 *buf)
> >  {
> > -	struct spi_mem_op op =
> > -		SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
> > -			   SPI_MEM_OP_ADDR(nor->addr_width, from, 0),
> > -			   SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
> > -			   SPI_MEM_OP_DATA_IN(len, buf, 0));
> > +	struct spi_mem_op op = spi_nor_spimem_get_read_op(nor);
> >  	bool usebouncebuf;
> >  	ssize_t nbytes;
> >  	int error;
> > 
> > -	spi_nor_spimem_setup_op(nor, &op, nor->read_proto);
> > -
> > -	/* convert the dummy cycles to the number of bytes */
> > -	op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > -	if (spi_nor_protocol_is_dtr(nor->read_proto))
> > -		op.dummy.nbytes *= 2;
> > +	op.addr.val = from;
> > +	op.data.nbytes = len;
> > +	op.data.buf.in = buf;
> > 
> >  	usebouncebuf = spi_nor_spimem_bounce(nor, &op);
> > 
> > @@ -3581,28 +3601,10 @@ EXPORT_SYMBOL_GPL(spi_nor_scan);
> >  static int spi_nor_create_read_dirmap(struct spi_nor *nor)
> >  {
> >  	struct spi_mem_dirmap_info info = {
> > -		.op_tmpl = SPI_MEM_OP(SPI_MEM_OP_CMD(nor->read_opcode, 0),
> > -				      SPI_MEM_OP_ADDR(nor->addr_width, 0, 0),
> > -				      SPI_MEM_OP_DUMMY(nor->read_dummy, 0),
> > -				      SPI_MEM_OP_DATA_IN(0, NULL, 0)),
> > +		.op_tmpl = spi_nor_spimem_get_read_op(nor),
> >  		.offset = 0,
> >  		.length = nor->mtd.size,
> >  	};
> > -	struct spi_mem_op *op = &info.op_tmpl;
> > -
> > -	spi_nor_spimem_setup_op(nor, op, nor->read_proto);
> > -
> > -	/* convert the dummy cycles to the number of bytes */
> > -	op->dummy.nbytes = (nor->read_dummy * op->dummy.buswidth) / 8;
> > -	if (spi_nor_protocol_is_dtr(nor->read_proto))
> > -		op->dummy.nbytes *= 2;
> > -
> > -	/*
> > -	 * Since spi_nor_spimem_setup_op() only sets buswidth when the number
> > -	 * of data bytes is non-zero, the data buswidth won't be set here. So,
> > -	 * do it explicitly.
> > -	 */
> > -	op->data.buswidth = spi_nor_get_protocol_data_nbits(nor->read_proto);
> 
> I guess this isn't needed anymore because spi_nor_spimem_get_read_op() uses
> a
> data length of 1, right?

Yes.

> 
> > 
> >  	nor->dirmap.rdesc = devm_spi_mem_dirmap_create(nor->dev, nor->spimem,
> >  						       &info);
> 
> -michael

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration
  2021-03-23 23:07   ` Michael Walle
@ 2021-03-24  8:08     ` Pratyush Yadav
  2021-04-29 16:23       ` Michael Walle
  0 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-03-24  8:08 UTC (permalink / raw)
  To: Michael Walle
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

On 24/03/21 12:07AM, Michael Walle wrote:
> Hi Pratyush,
> 
> Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> > Some controllers like the Cadence OSPI controller need to perform a
> > calibration sequence to operate at high clock speeds. This calibration
> > should happen after the flash is fully initialized otherwise the
> > calibration might happen in a different SPI mode from the one the flash
> > is finally set to. Add a hook that can be used to tell the controller
> > when the flash is ready for calibration. Whether calibration is needed
> > depends on the controller.
> > 
> > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
> > ---
> >  drivers/spi/spi-mem.c       | 12 ++++++++++++
> >  include/linux/spi/spi-mem.h |  8 ++++++++
> >  2 files changed, 20 insertions(+)
> > 
> > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> > index dc713b0c3c4d..e2f05ad3f4dc 100644
> > --- a/drivers/spi/spi-mem.c
> > +++ b/drivers/spi/spi-mem.c
> > @@ -464,6 +464,18 @@ int spi_mem_adjust_op_size(struct spi_mem *mem,
> > struct spi_mem_op *op)
> >  }
> >  EXPORT_SYMBOL_GPL(spi_mem_adjust_op_size);
> > 
> > +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op)
> > +{
> > +	struct spi_controller *ctlr = mem->spi->controller;
> > +
> > +	if (!ctlr->mem_ops || !ctlr->mem_ops->do_calibration)
> > +		return -EOPNOTSUPP;
> > +
> > +	ctlr->mem_ops->do_calibration(mem, op);
> 
> Can't a calibration fail?

It can. If it does, the controller falls back to lower speed transfers. 
There is not much the upper layer can do about this. That's why it is 
not informed whether it succeeded or not.

> 
> > +	return 0;
> > +}
> > +EXPORT_SYMBOL_GPL(spi_mem_do_calibration);
> > +
> >  static ssize_t spi_mem_no_dirmap_read(struct spi_mem_dirmap_desc *desc,
> >  				      u64 offs, size_t len, void *buf)
> >  {
> > diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h
> > index 2b65c9edc34e..97a2d280f2d0 100644
> > --- a/include/linux/spi/spi-mem.h
> > +++ b/include/linux/spi/spi-mem.h
> > @@ -250,6 +250,12 @@ static inline void *spi_mem_get_drvdata(struct
> > spi_mem *mem)
> >   *		  the currently mapped area), and the caller of
> >   *		  spi_mem_dirmap_write() is responsible for calling it again in
> >   *		  this case.
> > + * @do_calibration: perform calibration needed for high SPI clock speed
> > + *		    operation. Should be called after the SPI memory device has
> > + *		    been completely initialized. The op passed should contain
> > + *		    a template for the read operation used for the device so
> > + *		    the controller can decide what type of calibration is
> > + *		    required for this type of read.
> >   *
> >   * This interface should be implemented by SPI controllers providing an
> >   * high-level interface to execute SPI memory operation, which is
> > usually the
> > @@ -274,6 +280,7 @@ struct spi_controller_mem_ops {
> >  			       u64 offs, size_t len, void *buf);
> >  	ssize_t (*dirmap_write)(struct spi_mem_dirmap_desc *desc,
> >  				u64 offs, size_t len, const void *buf);
> > +	void (*do_calibration)(struct spi_mem *mem, struct spi_mem_op *op);
> >  };
> > 
> >  /**
> > @@ -346,6 +353,7 @@ bool spi_mem_dtr_supports_op(struct spi_mem *mem,
> >  #endif /* CONFIG_SPI_MEM */
> > 
> >  int spi_mem_adjust_op_size(struct spi_mem *mem, struct spi_mem_op *op);
> > +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op);
> > 
> >  bool spi_mem_supports_op(struct spi_mem *mem,
> >  			 const struct spi_mem_op *op);
> 
> -- 
> -michael

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation
  2021-03-11 19:12 ` [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation Pratyush Yadav
  2021-03-23 23:17   ` Michael Walle
@ 2021-04-08 12:48   ` Michael Walle
  1 sibling, 0 replies; 33+ messages in thread
From: Michael Walle @ 2021-04-08 12:48 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> Currently the spi_mem_op to read from the flash is used in two places:
> spi_nor_create_read_dirmap() and spi_nor_spimem_read_data(). In a later
> commit this number will increase to three. Instead of repeating the 
> same
> code thrice, add a function that returns a template of the read op. The
> callers can then fill in the details like address, data length, data
> buffer location.
> 
> Signed-off-by: Pratyush Yadav <p.yadav@ti.com>

Reviewed-by: Michael Walle <michael@walle.cc>

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration
  2021-03-24  8:08     ` Pratyush Yadav
@ 2021-04-29 16:23       ` Michael Walle
  2021-04-29 18:41         ` Pratyush Yadav
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Walle @ 2021-04-29 16:23 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

Hi Pratyush,

I've had a look at the LS1028A FlexSPI calibration feature. The
reference manual is very sparse on details, though. What you need to
do there is to program a special read command sequence (the whole
controller is made of these lookup table entries, where you can
have a short sequence of operations for read/write/program and so
on). Therefore, for data learning you'll take the read operation
and insert a LEARN op in between and read a specific data pattern.
Then the hardware will automatically figure out the correct sample
phase for the read data pins.

Unfortunately, it does not mention how often you have to do it. It
might be the case that is has to be calibrated more than once.

I'm just mentioning this so it won't be lost. If needed, it can
be added later.

Am 2021-03-24 09:08, schrieb Pratyush Yadav:
> On 24/03/21 12:07AM, Michael Walle wrote:
>> Am 2021-03-11 20:12, schrieb Pratyush Yadav:
>> > Some controllers like the Cadence OSPI controller need to perform a
>> > calibration sequence to operate at high clock speeds. This calibration
>> > should happen after the flash is fully initialized otherwise the
>> > calibration might happen in a different SPI mode from the one the flash
>> > is finally set to. Add a hook that can be used to tell the controller
>> > when the flash is ready for calibration. Whether calibration is needed
>> > depends on the controller.
>> >
>> > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
>> > ---
>> >  drivers/spi/spi-mem.c       | 12 ++++++++++++
>> >  include/linux/spi/spi-mem.h |  8 ++++++++
>> >  2 files changed, 20 insertions(+)
>> >
>> > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
>> > index dc713b0c3c4d..e2f05ad3f4dc 100644
>> > --- a/drivers/spi/spi-mem.c
>> > +++ b/drivers/spi/spi-mem.c
>> > @@ -464,6 +464,18 @@ int spi_mem_adjust_op_size(struct spi_mem *mem,
>> > struct spi_mem_op *op)
>> >  }
>> >  EXPORT_SYMBOL_GPL(spi_mem_adjust_op_size);
>> >
>> > +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op)
>> > +{
>> > +	struct spi_controller *ctlr = mem->spi->controller;
>> > +
>> > +	if (!ctlr->mem_ops || !ctlr->mem_ops->do_calibration)
>> > +		return -EOPNOTSUPP;
>> > +
>> > +	ctlr->mem_ops->do_calibration(mem, op);
>> 
>> Can't a calibration fail?
> 
> It can. If it does, the controller falls back to lower speed transfers.
> There is not much the upper layer can do about this. That's why it is
> not informed whether it succeeded or not.

Ok, if needed, that should be an easy change.

op is there to decide if we need a calibration at all, correct?
What if there are different factors, like frequency? For example
on the LS1028A its just a matter of the SCK frequency. It seems
that this parameter is tailored to the OPHY.

-michael

>> > +	return 0;
>> > +}
>> > +EXPORT_SYMBOL_GPL(spi_mem_do_calibration);
>> > +
>> >  static ssize_t spi_mem_no_dirmap_read(struct spi_mem_dirmap_desc *desc,
>> >  				      u64 offs, size_t len, void *buf)
>> >  {
>> > diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h
>> > index 2b65c9edc34e..97a2d280f2d0 100644
>> > --- a/include/linux/spi/spi-mem.h
>> > +++ b/include/linux/spi/spi-mem.h
>> > @@ -250,6 +250,12 @@ static inline void *spi_mem_get_drvdata(struct
>> > spi_mem *mem)
>> >   *		  the currently mapped area), and the caller of
>> >   *		  spi_mem_dirmap_write() is responsible for calling it again in
>> >   *		  this case.
>> > + * @do_calibration: perform calibration needed for high SPI clock speed
>> > + *		    operation. Should be called after the SPI memory device has
>> > + *		    been completely initialized. The op passed should contain
>> > + *		    a template for the read operation used for the device so
>> > + *		    the controller can decide what type of calibration is
>> > + *		    required for this type of read.
>> >   *
>> >   * This interface should be implemented by SPI controllers providing an
>> >   * high-level interface to execute SPI memory operation, which is
>> > usually the
>> > @@ -274,6 +280,7 @@ struct spi_controller_mem_ops {
>> >  			       u64 offs, size_t len, void *buf);
>> >  	ssize_t (*dirmap_write)(struct spi_mem_dirmap_desc *desc,
>> >  				u64 offs, size_t len, const void *buf);
>> > +	void (*do_calibration)(struct spi_mem *mem, struct spi_mem_op *op);
>> >  };
>> >
>> >  /**
>> > @@ -346,6 +353,7 @@ bool spi_mem_dtr_supports_op(struct spi_mem *mem,
>> >  #endif /* CONFIG_SPI_MEM */
>> >
>> >  int spi_mem_adjust_op_size(struct spi_mem *mem, struct spi_mem_op *op);
>> > +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op);
>> >
>> >  bool spi_mem_supports_op(struct spi_mem *mem,
>> >  			 const struct spi_mem_op *op);
>> 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible
  2021-03-12 10:17     ` Pratyush Yadav
@ 2021-04-29 16:28       ` Michael Walle
  2021-04-29 18:19         ` Pratyush Yadav
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Walle @ 2021-04-29 16:28 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Tudor.Ambarus, nm, kristo, robh+dt, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, lokeshvutla

Am 2021-03-12 11:17, schrieb Pratyush Yadav:
> On 12/03/21 09:13AM, Tudor.Ambarus@microchip.com wrote:
>> On 3/11/21 9:12 PM, Pratyush Yadav wrote:
>> > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>> >
>> > Check if a read is eligible for PHY and if it is, enable PHY and DQS.
>> 
>> DQS as in data strobe? Shouldn't the upper layer inform the QSPI 
>> controller
>> whether DS is required or not?
> 
> Yes, DQS as in data strobe. I need to check this again, but IIRC the
> controller cannot run in PHY mode unless DS is used. Ideally the upper
> layer should indeed inform the controller whether DS is 
> supported/in-use
> or not. That can be used to decide whether PHY mode (and consequently
> the DS line) is to be used or not.
> 
> Currently there are only two flashes that use 8D-8D-8D mode (S28HS512T
> and MT35XU512ABA), and both of them drive the DS line.

The LS1028A datasheet explicitly states that the calibration is only
used for non-DQS flashes. Which makes sense, because it just determine 
at
which point the input data is sampled. And if the flash provides a data
strobe, it already know when to sample it. What I am missing here?

-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible
  2021-04-29 16:28       ` Michael Walle
@ 2021-04-29 18:19         ` Pratyush Yadav
  2021-04-29 22:20           ` Michael Walle
  0 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-04-29 18:19 UTC (permalink / raw)
  To: Michael Walle
  Cc: Tudor.Ambarus, nm, kristo, robh+dt, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, lokeshvutla

On 29/04/21 06:28PM, Michael Walle wrote:
> Am 2021-03-12 11:17, schrieb Pratyush Yadav:
> > On 12/03/21 09:13AM, Tudor.Ambarus@microchip.com wrote:
> > > On 3/11/21 9:12 PM, Pratyush Yadav wrote:
> > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > > >
> > > > Check if a read is eligible for PHY and if it is, enable PHY and DQS.
> > > 
> > > DQS as in data strobe? Shouldn't the upper layer inform the QSPI
> > > controller
> > > whether DS is required or not?
> > 
> > Yes, DQS as in data strobe. I need to check this again, but IIRC the
> > controller cannot run in PHY mode unless DS is used. Ideally the upper
> > layer should indeed inform the controller whether DS is supported/in-use
> > or not. That can be used to decide whether PHY mode (and consequently
> > the DS line) is to be used or not.
> > 
> > Currently there are only two flashes that use 8D-8D-8D mode (S28HS512T
> > and MT35XU512ABA), and both of them drive the DS line.
> 
> The LS1028A datasheet explicitly states that the calibration is only
> used for non-DQS flashes. Which makes sense, because it just determine at
> which point the input data is sampled. And if the flash provides a data
> strobe, it already know when to sample it. What I am missing here?

If there was 0 delay in transferring the signals from flash to 
SoC/controller, you would be right. But in practice there is a small but 
noticeable delay from when the flash launches the signal and when it is 
received by the device. So by the time the DQS signal reaches the SoC it 
might already be too late and the data lines might not be valid any 
more. The calibration accounts for these (and some others) delays.

See [0] for a somewhat similar discussion I had with Tudor.

[0] https://lore.kernel.org/linux-mtd/20210312181447.dlecnw2oed7jtxe7@ti.com/

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration
  2021-04-29 16:23       ` Michael Walle
@ 2021-04-29 18:41         ` Pratyush Yadav
  2021-04-29 22:46           ` Michael Walle
  0 siblings, 1 reply; 33+ messages in thread
From: Pratyush Yadav @ 2021-04-29 18:41 UTC (permalink / raw)
  To: Michael Walle
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

On 29/04/21 06:23PM, Michael Walle wrote:
> Hi Pratyush,
> 
> I've had a look at the LS1028A FlexSPI calibration feature. The
> reference manual is very sparse on details, though. What you need to
> do there is to program a special read command sequence (the whole
> controller is made of these lookup table entries, where you can
> have a short sequence of operations for read/write/program and so
> on). Therefore, for data learning you'll take the read operation
> and insert a LEARN op in between and read a specific data pattern.
> Then the hardware will automatically figure out the correct sample
> phase for the read data pins.
> 
> Unfortunately, it does not mention how often you have to do it. It
> might be the case that is has to be calibrated more than once.

I haven't read the datasheet, I wonder how long this calibration takes. 
If it is too long then the overhead might not even be worth the extra 
read throughput. Especially when using a file system on top which 
generally don't do very large reads in one go.

Anyway, when the do_calibration() is called the controller can save the 
calibration op and use it later as needed. It knows when an exec_op() 
will result in a read since it has access to the whole op.

> 
> I'm just mentioning this so it won't be lost. If needed, it can
> be added later.
> 
> Am 2021-03-24 09:08, schrieb Pratyush Yadav:
> > On 24/03/21 12:07AM, Michael Walle wrote:
> > > Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> > > > Some controllers like the Cadence OSPI controller need to perform a
> > > > calibration sequence to operate at high clock speeds. This calibration
> > > > should happen after the flash is fully initialized otherwise the
> > > > calibration might happen in a different SPI mode from the one the flash
> > > > is finally set to. Add a hook that can be used to tell the controller
> > > > when the flash is ready for calibration. Whether calibration is needed
> > > > depends on the controller.
> > > >
> > > > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
> > > > ---
> > > >  drivers/spi/spi-mem.c       | 12 ++++++++++++
> > > >  include/linux/spi/spi-mem.h |  8 ++++++++
> > > >  2 files changed, 20 insertions(+)
> > > >
> > > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
> > > > index dc713b0c3c4d..e2f05ad3f4dc 100644
> > > > --- a/drivers/spi/spi-mem.c
> > > > +++ b/drivers/spi/spi-mem.c
> > > > @@ -464,6 +464,18 @@ int spi_mem_adjust_op_size(struct spi_mem *mem,
> > > > struct spi_mem_op *op)
> > > >  }
> > > >  EXPORT_SYMBOL_GPL(spi_mem_adjust_op_size);
> > > >
> > > > +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op)
> > > > +{
> > > > +	struct spi_controller *ctlr = mem->spi->controller;
> > > > +
> > > > +	if (!ctlr->mem_ops || !ctlr->mem_ops->do_calibration)
> > > > +		return -EOPNOTSUPP;
> > > > +
> > > > +	ctlr->mem_ops->do_calibration(mem, op);
> > > 
> > > Can't a calibration fail?
> > 
> > It can. If it does, the controller falls back to lower speed transfers.
> > There is not much the upper layer can do about this. That's why it is
> > not informed whether it succeeded or not.
> 
> Ok, if needed, that should be an easy change.
> 
> op is there to decide if we need a calibration at all, correct?

Yes. It can also be used to choose which calibration algorithm to use. 
For example on the Cadence controller, there are different algorithms 
for 8S and 8D operations.

> What if there are different factors, like frequency? For example
> on the LS1028A its just a matter of the SCK frequency. It seems
> that this parameter is tailored to the OPHY.

As of now there is no way in SPI MEM to tell the controller the expected 
speed of the operation. AFAIK most controllers get the speed via device 
tree. So in the current case, the controller already knows the speed it 
should run at, and can decide if calibration is needed or not.

But if operation speed is eventually added to SPI MEM, I would assume it 
would be part of struct spi_mem_op. The op passed in would have this 
information filled, and the controller can use that information to 
decide if it needs to perform the calibration or not.

I am all for making this API flexible, but with very few controllers 
supporting this feature in the wild, it is difficult to predict all the 
information that might be needed. In the current state, I think the API 
provides a fair bit of information to the controller about how a read 
operation would look like.

> 
> -michael
> 
> > > > +	return 0;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(spi_mem_do_calibration);
> > > > +
> > > >  static ssize_t spi_mem_no_dirmap_read(struct spi_mem_dirmap_desc *desc,
> > > >  				      u64 offs, size_t len, void *buf)
> > > >  {
> > > > diff --git a/include/linux/spi/spi-mem.h b/include/linux/spi/spi-mem.h
> > > > index 2b65c9edc34e..97a2d280f2d0 100644
> > > > --- a/include/linux/spi/spi-mem.h
> > > > +++ b/include/linux/spi/spi-mem.h
> > > > @@ -250,6 +250,12 @@ static inline void *spi_mem_get_drvdata(struct
> > > > spi_mem *mem)
> > > >   *		  the currently mapped area), and the caller of
> > > >   *		  spi_mem_dirmap_write() is responsible for calling it again in
> > > >   *		  this case.
> > > > + * @do_calibration: perform calibration needed for high SPI clock speed
> > > > + *		    operation. Should be called after the SPI memory device has
> > > > + *		    been completely initialized. The op passed should contain
> > > > + *		    a template for the read operation used for the device so
> > > > + *		    the controller can decide what type of calibration is
> > > > + *		    required for this type of read.
> > > >   *
> > > >   * This interface should be implemented by SPI controllers providing an
> > > >   * high-level interface to execute SPI memory operation, which is
> > > > usually the
> > > > @@ -274,6 +280,7 @@ struct spi_controller_mem_ops {
> > > >  			       u64 offs, size_t len, void *buf);
> > > >  	ssize_t (*dirmap_write)(struct spi_mem_dirmap_desc *desc,
> > > >  				u64 offs, size_t len, const void *buf);
> > > > +	void (*do_calibration)(struct spi_mem *mem, struct spi_mem_op *op);
> > > >  };
> > > >
> > > >  /**
> > > > @@ -346,6 +353,7 @@ bool spi_mem_dtr_supports_op(struct spi_mem *mem,
> > > >  #endif /* CONFIG_SPI_MEM */
> > > >
> > > >  int spi_mem_adjust_op_size(struct spi_mem *mem, struct spi_mem_op *op);
> > > > +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op);
> > > >
> > > >  bool spi_mem_supports_op(struct spi_mem *mem,
> > > >  			 const struct spi_mem_op *op);
> > > 
> 

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible
  2021-04-29 18:19         ` Pratyush Yadav
@ 2021-04-29 22:20           ` Michael Walle
  2021-05-10 11:39             ` Pratyush Yadav
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Walle @ 2021-04-29 22:20 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Tudor.Ambarus, nm, kristo, robh+dt, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, lokeshvutla

Am 2021-04-29 20:19, schrieb Pratyush Yadav:
> On 29/04/21 06:28PM, Michael Walle wrote:
>> Am 2021-03-12 11:17, schrieb Pratyush Yadav:
>> > On 12/03/21 09:13AM, Tudor.Ambarus@microchip.com wrote:
>> > > On 3/11/21 9:12 PM, Pratyush Yadav wrote:
>> > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
>> > > >
>> > > > Check if a read is eligible for PHY and if it is, enable PHY and DQS.
>> > >
>> > > DQS as in data strobe? Shouldn't the upper layer inform the QSPI
>> > > controller
>> > > whether DS is required or not?
>> >
>> > Yes, DQS as in data strobe. I need to check this again, but IIRC the
>> > controller cannot run in PHY mode unless DS is used. Ideally the upper
>> > layer should indeed inform the controller whether DS is supported/in-use
>> > or not. That can be used to decide whether PHY mode (and consequently
>> > the DS line) is to be used or not.
>> >
>> > Currently there are only two flashes that use 8D-8D-8D mode (S28HS512T
>> > and MT35XU512ABA), and both of them drive the DS line.
>> 
>> The LS1028A datasheet explicitly states that the calibration is only
>> used for non-DQS flashes. Which makes sense, because it just determine 
>> at
>> which point the input data is sampled. And if the flash provides a 
>> data
>> strobe, it already know when to sample it. What I am missing here?
> 
> If there was 0 delay in transferring the signals from flash to
> SoC/controller, you would be right. But in practice there is a small 
> but
> noticeable delay from when the flash launches the signal and when it is
> received by the device. So by the time the DQS signal reaches the SoC 
> it
> might already be too late and the data lines might not be valid any
> more. The calibration accounts for these (and some others) delays.

DQS and the data signals are trace length matched, so for data reads
they will end up on the IO pad of the SoC at the same time. This is
also mentioned in [1] (Fig 1.1, point 4 and 5). So while there needs
to be a delay on the clock line for the receiving FF, the best value
for this should be half the SCK clock period.

Does this work without DQS? That should be the main purpose for a
calibration, no? Because in this case, you'll have to determine
the delay between SCK and the data signals (for reads).

Btw. I can't get my head around how the TX delay search would work.
Basically you shift the SCK to the command / data to the flash. So
the flash will either recognize a valid read command or if the delay
is too short/too long the flash will (hopefully) ignore the wrong
command, correct? Might there be any misinterpreted commands which
might be harmful? Are there any flashes which actually need a delay
between data out and SCK?

Of course, the calibration might help with broken hardware where the
SCK/DQ/DQS traces are not length matched.

-michael

> 
> See [0] for a somewhat similar discussion I had with Tudor.
> 
> [0] 
> https://lore.kernel.org/linux-mtd/20210312181447.dlecnw2oed7jtxe7@ti.com/

[1] https://www.ti.com/lit/an/spract2/spract2.pdf

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration
  2021-04-29 18:41         ` Pratyush Yadav
@ 2021-04-29 22:46           ` Michael Walle
  0 siblings, 0 replies; 33+ messages in thread
From: Michael Walle @ 2021-04-29 22:46 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

Am 2021-04-29 20:41, schrieb Pratyush Yadav:
> On 29/04/21 06:23PM, Michael Walle wrote:
>> I've had a look at the LS1028A FlexSPI calibration feature. The
>> reference manual is very sparse on details, though. What you need to
>> do there is to program a special read command sequence (the whole
>> controller is made of these lookup table entries, where you can
>> have a short sequence of operations for read/write/program and so
>> on). Therefore, for data learning you'll take the read operation
>> and insert a LEARN op in between and read a specific data pattern.
>> Then the hardware will automatically figure out the correct sample
>> phase for the read data pins.
>> 
>> Unfortunately, it does not mention how often you have to do it. It
>> might be the case that is has to be calibrated more than once.
> 
> I haven't read the datasheet, I wonder how long this calibration takes.
> If it is too long then the overhead might not even be worth the extra
> read throughput. Especially when using a file system on top which
> generally don't do very large reads in one go.

I was just thinking of compensating a possible temperature drift. You
wouldn't have to do it on every read.

There is a second mode, where it is actually done on every read. But
that will be used where the flash supports a read preamble, where
dummy bytes after the read opcode are replaced by a calibration pattern.
If the pattern has the same length as the dummy bytes there is no 
penalty.
IIRC the controller just supports a pattern of max 32 bits.

Oh I forgot to mention, this doesn't need to be repeated. I guess
the hardware already captures all possible phases (there are only
16) and compares each one with a predefined pattern.

> Anyway, when the do_calibration() is called the controller can save the
> calibration op and use it later as needed. It knows when an exec_op()
> will result in a read since it has access to the whole op.
> 
>> 
>> I'm just mentioning this so it won't be lost. If needed, it can
>> be added later.
>> 
>> Am 2021-03-24 09:08, schrieb Pratyush Yadav:
>> > On 24/03/21 12:07AM, Michael Walle wrote:
>> > > Am 2021-03-11 20:12, schrieb Pratyush Yadav:
>> > > > Some controllers like the Cadence OSPI controller need to perform a
>> > > > calibration sequence to operate at high clock speeds. This calibration
>> > > > should happen after the flash is fully initialized otherwise the
>> > > > calibration might happen in a different SPI mode from the one the flash
>> > > > is finally set to. Add a hook that can be used to tell the controller
>> > > > when the flash is ready for calibration. Whether calibration is needed
>> > > > depends on the controller.
>> > > >
>> > > > Signed-off-by: Pratyush Yadav <p.yadav@ti.com>
>> > > > ---
>> > > >  drivers/spi/spi-mem.c       | 12 ++++++++++++
>> > > >  include/linux/spi/spi-mem.h |  8 ++++++++
>> > > >  2 files changed, 20 insertions(+)
>> > > >
>> > > > diff --git a/drivers/spi/spi-mem.c b/drivers/spi/spi-mem.c
>> > > > index dc713b0c3c4d..e2f05ad3f4dc 100644
>> > > > --- a/drivers/spi/spi-mem.c
>> > > > +++ b/drivers/spi/spi-mem.c
>> > > > @@ -464,6 +464,18 @@ int spi_mem_adjust_op_size(struct spi_mem *mem,
>> > > > struct spi_mem_op *op)
>> > > >  }
>> > > >  EXPORT_SYMBOL_GPL(spi_mem_adjust_op_size);
>> > > >
>> > > > +int spi_mem_do_calibration(struct spi_mem *mem, struct spi_mem_op *op)
>> > > > +{
>> > > > +	struct spi_controller *ctlr = mem->spi->controller;
>> > > > +
>> > > > +	if (!ctlr->mem_ops || !ctlr->mem_ops->do_calibration)
>> > > > +		return -EOPNOTSUPP;
>> > > > +
>> > > > +	ctlr->mem_ops->do_calibration(mem, op);
>> > >
>> > > Can't a calibration fail?
>> >
>> > It can. If it does, the controller falls back to lower speed transfers.
>> > There is not much the upper layer can do about this. That's why it is
>> > not informed whether it succeeded or not.
>> 
>> Ok, if needed, that should be an easy change.
>> 
>> op is there to decide if we need a calibration at all, correct?
> 
> Yes. It can also be used to choose which calibration algorithm to use.
> For example on the Cadence controller, there are different algorithms
> for 8S and 8D operations.
> 
>> What if there are different factors, like frequency? For example
>> on the LS1028A its just a matter of the SCK frequency. It seems
>> that this parameter is tailored to the OPHY.
> 
> As of now there is no way in SPI MEM to tell the controller the 
> expected
> speed of the operation. AFAIK most controllers get the speed via device
> tree. So in the current case, the controller already knows the speed it
> should run at, and can decide if calibration is needed or not.
> 
> But if operation speed is eventually added to SPI MEM, I would assume 
> it
> would be part of struct spi_mem_op. The op passed in would have this
> information filled, and the controller can use that information to
> decide if it needs to perform the calibration or not.
> 
> I am all for making this API flexible, but with very few controllers
> supporting this feature in the wild, it is difficult to predict all the
> information that might be needed. In the current state, I think the API
> provides a fair bit of information to the controller about how a read
> operation would look like.

Sure, and its also quite hard to review without any other hardware
which supports that ;) I was thinking about letting the spi driver
call into spi-mem to retrieve the information it needs instead of
having that second argument. Anyway, if we need any changes in the
future this isn't set in stone.

-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 5/6] spi: cadence-qspi: Tune PHY to allow running at higher frequencies
  2021-03-11 19:12 ` [RFC PATCH 5/6] spi: cadence-qspi: Tune PHY to allow running at higher frequencies Pratyush Yadav
@ 2021-04-29 22:48   ` Michael Walle
  2021-04-30  5:42     ` Pratyush Yadav
  0 siblings, 1 reply; 33+ messages in thread
From: Michael Walle @ 2021-04-29 22:48 UTC (permalink / raw)
  To: Pratyush Yadav
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> +	if (of_property_read_u32(np, "cdns,phy-tx-start", 
> &f_pdata->phy_tx_start))
> +		f_pdata->phy_tx_start = 16;
> +
> +	if (of_property_read_u32(np, "cdns,phy-tx-end", 
> &f_pdata->phy_tx_end))
> +		f_pdata->phy_tx_end = 48;
> +

I didn't see a dt-bindings patch for these.

-michael

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 5/6] spi: cadence-qspi: Tune PHY to allow running at higher frequencies
  2021-04-29 22:48   ` Michael Walle
@ 2021-04-30  5:42     ` Pratyush Yadav
  0 siblings, 0 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-04-30  5:42 UTC (permalink / raw)
  To: Michael Walle
  Cc: Nishanth Menon, Tero Kristo, Rob Herring, Tudor Ambarus,
	Miquel Raynal, Richard Weinberger, Vignesh Raghavendra,
	Mark Brown, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, Lokesh Vutla

On 30/04/21 12:48AM, Michael Walle wrote:
> Am 2021-03-11 20:12, schrieb Pratyush Yadav:
> > +	if (of_property_read_u32(np, "cdns,phy-tx-start",
> > &f_pdata->phy_tx_start))
> > +		f_pdata->phy_tx_start = 16;
> > +
> > +	if (of_property_read_u32(np, "cdns,phy-tx-end", &f_pdata->phy_tx_end))
> > +		f_pdata->phy_tx_end = 48;
> > +
> 
> I didn't see a dt-bindings patch for these.

Right. Will add them in the next re-roll.
 
> -michael

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible
  2021-04-29 22:20           ` Michael Walle
@ 2021-05-10 11:39             ` Pratyush Yadav
  0 siblings, 0 replies; 33+ messages in thread
From: Pratyush Yadav @ 2021-05-10 11:39 UTC (permalink / raw)
  To: Michael Walle
  Cc: Tudor.Ambarus, nm, kristo, robh+dt, miquel.raynal, richard,
	vigneshr, broonie, linux-arm-kernel, devicetree, linux-kernel,
	linux-mtd, linux-spi, lokeshvutla

On 30/04/21 12:20AM, Michael Walle wrote:
> Am 2021-04-29 20:19, schrieb Pratyush Yadav:
> > On 29/04/21 06:28PM, Michael Walle wrote:
> > > Am 2021-03-12 11:17, schrieb Pratyush Yadav:
> > > > On 12/03/21 09:13AM, Tudor.Ambarus@microchip.com wrote:
> > > > > On 3/11/21 9:12 PM, Pratyush Yadav wrote:
> > > > > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the content is safe
> > > > > >
> > > > > > Check if a read is eligible for PHY and if it is, enable PHY and DQS.
> > > > >
> > > > > DQS as in data strobe? Shouldn't the upper layer inform the QSPI
> > > > > controller
> > > > > whether DS is required or not?
> > > >
> > > > Yes, DQS as in data strobe. I need to check this again, but IIRC the
> > > > controller cannot run in PHY mode unless DS is used. Ideally the upper
> > > > layer should indeed inform the controller whether DS is supported/in-use
> > > > or not. That can be used to decide whether PHY mode (and consequently
> > > > the DS line) is to be used or not.
> > > >
> > > > Currently there are only two flashes that use 8D-8D-8D mode (S28HS512T
> > > > and MT35XU512ABA), and both of them drive the DS line.
> > > 
> > > The LS1028A datasheet explicitly states that the calibration is only
> > > used for non-DQS flashes. Which makes sense, because it just
> > > determine at
> > > which point the input data is sampled. And if the flash provides a
> > > data
> > > strobe, it already know when to sample it. What I am missing here?
> > 
> > If there was 0 delay in transferring the signals from flash to
> > SoC/controller, you would be right. But in practice there is a small but
> > noticeable delay from when the flash launches the signal and when it is
> > received by the device. So by the time the DQS signal reaches the SoC it
> > might already be too late and the data lines might not be valid any
> > more. The calibration accounts for these (and some others) delays.
> 
> DQS and the data signals are trace length matched, so for data reads
> they will end up on the IO pad of the SoC at the same time. This is
> also mentioned in [1] (Fig 1.1, point 4 and 5). So while there needs
> to be a delay on the clock line for the receiving FF, the best value
> for this should be half the SCK clock period.

In the explanation below Figure 1-1, I see:

  The DQS and data are edge aligned at points 4 and 5 in Figure 1-1. DQS 
  must be delayed by the RX PDL to a point inside the data eye to sample 
  valid data at point 2.

So the RX delay is being used to tune exactly when to sample the data 
lines. From what I understand, the delay is not set to SCK / 4 because 
this delay might change with temperature. This calibration algorithm has 
been designed to be reslilent to temperature changes so it performs some 
other heuristics to find the ideal delay for the DQS. Plus, part of the 
delay comes from the time taken by the controller to sample the data. 
The algorithm takes care of that delay as well.

To be completely honest, I'm not very well versed with the internal 
details of the calibration. I only have a high level view of it. I hope 
my explanation was clear enough. If not, I can spend some more time to 
understand how the internals of the controller work and get a clearer 
understanding of what is going on in the background.

> 
> Does this work without DQS? That should be the main purpose for a
> calibration, no? Because in this case, you'll have to determine
> the delay between SCK and the data signals (for reads).

It should, but I have not tested it without DQS.

> 
> Btw. I can't get my head around how the TX delay search would work.
> Basically you shift the SCK to the command / data to the flash. So
> the flash will either recognize a valid read command or if the delay
> is too short/too long the flash will (hopefully) ignore the wrong
> command, correct? Might there be any misinterpreted commands which
> might be harmful? Are there any flashes which actually need a delay
> between data out and SCK?

Yes, it is possible to send an invalid read command. Section 2.1 says:

  TX min and max (side walls of the passing region) are formed by the 
  setup and hold time requirement of the OSPI device. TX delays outside 
  this range cause command and address bytes to be latched incorrectly 
  by the OSPI device, resulting in an unsuccessful read.

Currently, the TX limits are hard coded such that this does not happen 
for the two flashes I have tested with: Micron MT35 and Cypress S28. If 
later the need comes up such that the limits are not enough to encompass 
all the flashes we need to support, I can look into setting the limits 
via device tree.

> 
> Of course, the calibration might help with broken hardware where the
> SCK/DQ/DQS traces are not length matched.
> 
> -michael
> 
> > 
> > See [0] for a somewhat similar discussion I had with Tudor.
> > 
> > [0]
> > https://lore.kernel.org/linux-mtd/20210312181447.dlecnw2oed7jtxe7@ti.com/
> 
> [1] https://www.ti.com/lit/an/spract2/spract2.pdf

-- 
Regards,
Pratyush Yadav
Texas Instruments Inc.

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, back to index

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 19:12 [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Pratyush Yadav
2021-03-11 19:12 ` [RFC PATCH 1/6] spi: spi-mem: Tell controller when device is ready for calibration Pratyush Yadav
2021-03-23 23:07   ` Michael Walle
2021-03-24  8:08     ` Pratyush Yadav
2021-04-29 16:23       ` Michael Walle
2021-04-29 18:41         ` Pratyush Yadav
2021-04-29 22:46           ` Michael Walle
2021-03-11 19:12 ` [RFC PATCH 2/6] mtd: spi-nor: core: consolidate read op creation Pratyush Yadav
2021-03-23 23:17   ` Michael Walle
2021-03-24  8:04     ` Pratyush Yadav
2021-04-08 12:48   ` Michael Walle
2021-03-11 19:12 ` [RFC PATCH 3/6] mtd: spi-nor: core: run calibration when initialization is done Pratyush Yadav
2021-03-11 19:12 ` [RFC PATCH 4/6] spi: cadence-qspi: Use PHY for DAC reads if possible Pratyush Yadav
2021-03-12  9:13   ` Tudor.Ambarus
2021-03-12 10:17     ` Pratyush Yadav
2021-04-29 16:28       ` Michael Walle
2021-04-29 18:19         ` Pratyush Yadav
2021-04-29 22:20           ` Michael Walle
2021-05-10 11:39             ` Pratyush Yadav
2021-03-11 19:12 ` [RFC PATCH 5/6] spi: cadence-qspi: Tune PHY to allow running at higher frequencies Pratyush Yadav
2021-04-29 22:48   ` Michael Walle
2021-04-30  5:42     ` Pratyush Yadav
2021-03-11 19:12 ` [RFC PATCH 6/6] arm64: dts: ti: k3-j721e-som-p0: Enable PHY calibration Pratyush Yadav
2021-03-12  9:09 ` [RFC PATCH 0/6] spi: Add OSPI PHY calibration support for spi-cadence-quadspi Tudor.Ambarus
2021-03-12 10:10   ` Pratyush Yadav
2021-03-12 10:20     ` Michael Walle
2021-03-12 11:07       ` Pratyush Yadav
2021-03-12 13:26         ` Michael Walle
2021-03-12 11:23     ` Tudor.Ambarus
2021-03-12 18:14       ` Pratyush Yadav
2021-03-12 13:32 ` Michael Walle
2021-03-12 14:59   ` Tudor.Ambarus
2021-03-12 17:00   ` Pratyush Yadav

Linux-SPI Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-spi/0 linux-spi/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-spi linux-spi/ https://lore.kernel.org/linux-spi \
		linux-spi@vger.kernel.org
	public-inbox-index linux-spi

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-spi


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git