qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
@ 2021-01-14 15:08 Bin Meng
  2021-01-14 15:08 ` [PATCH 1/9] hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes Bin Meng
                   ` (11 more replies)
  0 siblings, 12 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:08 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Kevin Wolf, qemu-devel, qemu-block, Marcin Krzeminski,
	Andrew Jeffery, Bin Meng, Havard Skinnemoen, Max Reitz,
	Tyrone Ting, qemu-arm, Cédric Le Goater, Joe Komlodi,
	Joel Stanley

From: Bin Meng <bin.meng@windriver.com>

The m25p80 model uses s->needed_bytes to indicate how many follow-up
bytes are expected to be received after it receives a command. For
example, depending on the address mode, either 3-byte address or
4-byte address is needed.

For fast read family commands, some dummy cycles are required after
sending the address bytes, and the dummy cycles need to be counted
in s->needed_bytes. This is where the mess began.

As the variable name (needed_bytes) indicates, the unit is in byte.
It is not in bit, or cycle. However for some reason the model has
been using the number of dummy cycles for s->needed_bytes. The right
approach is to convert the number of dummy cycles to bytes based on
the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).

Things get complicated when interacting with different SPI or QSPI
flash controllers. There are major two cases:

- Dummy bytes prepared by drivers, and wrote to the controller fifo.
  For such case, driver will calculate the correct number of dummy
  bytes and write them into the tx fifo. Fixing the m25p80 model will
  fix flashes working with such controllers.
- Dummy bytes not prepared by drivers. Drivers just tell the hardware
  the dummy cycle configuration via some registers, and hardware will
  automatically generate dummy cycles for us. Fixing the m25p80 model
  is not enough, and we will need to fix the SPI/QSPI models for such
  controllers.

This series fixes the mess in the m25p80 from the flash side first,
followed by fixes to 3 known SPI controller models that fall into
the 2nd case above.

Please note, I have no way to verify patch 7/8/9 because:

* There is no public datasheet available for the SoC / SPI controller
* There is no QEMU docs, or details that tell people how to boot either
  U-Boot or Linux kernel to verify the functionality

These 3 patches are very likely to be wrong. Hence I would like to ask
help from the original author who wrote these SPI controller models
to help testing, or completely rewrite these 3 patches to fix things.
Thanks!

Patch 6 is unvalidated with QEMU, mainly because there is no doc to
tell people how to boot anything to test. But I have some confidence
based on my read of the ZynqMP manual, as well as some experimental
testing on a real ZCU102 board.

Other flash patches can be tested with the SiFive SPI series:
http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391

Cherry-pick patch 16 and 17 from the series above, and switch to
different flash model to test with the following command:

$ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot

I've picked up two for testing:

QEMU flash: "sst25vf032b"

  U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)

  CPU:   rv64imafdcsu
  Model: SiFive HiFive Unleashed A00
  DRAM:  2 GiB
  MMC:
  Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB
  *** Warning - bad CRC, using default environment

  In:    serial@10010000
  Out:   serial@10010000
  Err:   serial@10010000
  Net:   failed to get gemgxl_reset reset

  Warning: ethernet@10090000 MAC addresses don't match:
  Address in DT is                52:54:00:12:34:56
  Address in environment is       70:b3:d5:92:f0:01
  eth0: ethernet@10090000
  Hit any key to stop autoboot:  0
  => sf probe
  SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB,
  total 4 MiB
  => sf test 1ff000 1000
  SPI flash test:
  0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
  1 check: 10 ticks, 400 KiB/s 3.200 Mbps
  2 write: 170 ticks, 23 KiB/s 0.184 Mbps
  3 read: 9 ticks, 444 KiB/s 3.552 Mbps
  Test passed
  0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
  1 check: 10 ticks, 400 KiB/s 3.200 Mbps
  2 write: 170 ticks, 23 KiB/s 0.184 Mbps
  3 read: 9 ticks, 444 KiB/s 3.552 Mbps

QEMU flash: "mx66u51235f"

  U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)

  CPU:   rv64imafdcsu
  Model: SiFive HiFive Unleashed A00
  DRAM:  2 GiB
  MMC:
  Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
  *** Warning - bad CRC, using default environment

  In:    serial@10010000
  Out:   serial@10010000
  Err:   serial@10010000
  Net:   failed to get gemgxl_reset reset

  Warning: ethernet@10090000 MAC addresses don't match:
  Address in DT is                52:54:00:12:34:56
  Address in environment is       70:b3:d5:92:f0:01
  eth0: ethernet@10090000
  Hit any key to stop autoboot:  0
  => sf probe
  SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
  => sf test 0 8000
  SPI flash test:
  0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
  1 check: 80 ticks, 400 KiB/s 3.200 Mbps
  2 write: 83 ticks, 385 KiB/s 3.080 Mbps
  3 read: 79 ticks, 405 KiB/s 3.240 Mbps
  Test passed
  0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
  1 check: 80 ticks, 400 KiB/s 3.200 Mbps
  2 write: 83 ticks, 385 KiB/s 3.080 Mbps
  3 read: 79 ticks, 405 KiB/s 3.240 Mbps

I am sure there will be bugs, and I have not tested all flashes affected.
But I want to send out this series for an early discussion and comments.
I will continue my testing.


Bin Meng (9):
  hw/block: m25p80: Fix the number of dummy bytes needed for Windbond
    flashes
  hw/block: m25p80: Fix the number of dummy bytes needed for
    Numonyx/Micron flashes
  hw/block: m25p80: Fix the number of dummy bytes needed for Macronix
    flashes
  hw/block: m25p80: Fix the number of dummy bytes needed for Spansion
    flashes
  hw/block: m25p80: Support fast read for SST flashes
  hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
  Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4
    command"
  Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
  hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic

 include/hw/ssi/aspeed_smc.h |   3 -
 hw/block/m25p80.c           | 153 ++++++++++++++++++++++++++++--------
 hw/ssi/aspeed_smc.c         | 116 +--------------------------
 hw/ssi/npcm7xx_fiu.c        |   8 +-
 hw/ssi/xilinx_spips.c       |  29 ++++++-
 5 files changed, 153 insertions(+), 156 deletions(-)

-- 
2.25.1



^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/9] hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
@ 2021-01-14 15:08 ` Bin Meng
  2021-01-14 15:08 ` [PATCH 2/9] hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes Bin Meng
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:08 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Kevin Wolf, qemu-block, Marcin Krzeminski, Bin Meng, qemu-devel,
	Max Reitz, Cédric Le Goater

From: Bin Meng <bin.meng@windriver.com>

The m25p80 model uses s->needed_bytes to indicate how many follow-up
bytes are expected to be received after it receives a command. For
example, depending on the address mode, either 3-byte address or
4-byte address is needed.

For fast read family commands, some dummy cycles are required after
sending the address bytes, and the dummy cycles need to be counted
in s->needed_bytes. This is where the mess began.

As the variable name (needed_bytes) indicates, the unit is in byte.
It is not in bit, or cycle. However for some reason the model has
been using the number of dummy cycles for s->needed_bytes. The right
approach is to convert the number of dummy cycles to bytes based on
the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).

Things get complicated when interacting with different SPI or QSPI
flash controllers. There are major two cases:

- Dummy bytes prepared by drivers, and wrote to the controller fifo.
  For such case, driver will calculate the correct number of dummy
  bytes and write them into the tx fifo. Fixing the m25p80 model will
  fix flashes working with such controllers.
- Dummy bytes not prepared by drivers. Drivers just tell the hardware
  the dummy cycle configuration via some registers, and hardware will
  automatically generate dummy cycles for us. Fixing the m25p80 model
  is not enough, and we will need to fix the SPI/QSPI models for such
  controllers.

Let's fix the mess from the flash side first. We start from a easy one,
the Winbond flashes.

Per the Windbond W25Q256JV datasheet [1] instrunction set table
(chapter 8.1.2, 8.1.3, 8.1.4, 8.1.5), fix the wrong number of
dummy bytes needed for fast read commands.

[1] https://www.winbond.com/resource-files/w25q256jv%20spi%20revb%2009202016.pdf

Fixes: fe8477052831 ("m25p80: Fix QIOR/DIOR handling for Winbond")
Fixes: 3830c7a460b8 ("m25p80: Fix WINBOND fast read command handling")
Fixes: cf6f1efe0b57 ("m25p80: Fast read commands family changes")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 hw/block/m25p80.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index b744a58d1c..c947716f99 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -875,9 +875,22 @@ static void decode_fast_read_cmd(Flash *s)
 {
     s->needed_bytes = get_addr_length(s);
     switch (get_man(s)) {
-    /* Dummy cycles - modeled with bytes writes instead of bits */
+    /* Dummy cycles - modeled with bytes writes */
     case MAN_WINBOND:
-        s->needed_bytes += 8;
+        switch (s->cmd_in_progress) {
+        case FAST_READ:
+        case FAST_READ4:
+            s->needed_bytes += 1;
+            break;
+        case DOR:
+        case DOR4:
+            s->needed_bytes += 2;
+            break;
+        case QOR:
+        case QOR4:
+            s->needed_bytes += 4;
+            break;
+        }
         break;
     case MAN_NUMONYX:
         s->needed_bytes += numonyx_extract_cfg_num_dummies(s);
@@ -906,7 +919,7 @@ static void decode_fast_read_cmd(Flash *s)
 static void decode_dio_read_cmd(Flash *s)
 {
     s->needed_bytes = get_addr_length(s);
-    /* Dummy cycles modeled with bytes writes instead of bits */
+    /* Dummy cycles modeled with bytes writes */
     switch (get_man(s)) {
     case MAN_WINBOND:
         s->needed_bytes += WINBOND_CONTINUOUS_READ_MODE_CMD_LEN;
@@ -945,11 +958,10 @@ static void decode_dio_read_cmd(Flash *s)
 static void decode_qio_read_cmd(Flash *s)
 {
     s->needed_bytes = get_addr_length(s);
-    /* Dummy cycles modeled with bytes writes instead of bits */
+    /* Dummy cycles modeled with bytes writes */
     switch (get_man(s)) {
     case MAN_WINBOND:
-        s->needed_bytes += WINBOND_CONTINUOUS_READ_MODE_CMD_LEN;
-        s->needed_bytes += 4;
+        s->needed_bytes += WINBOND_CONTINUOUS_READ_MODE_CMD_LEN + 2;
         break;
     case MAN_SPANSION:
         s->needed_bytes += SPANSION_CONTINUOUS_READ_MODE_CMD_LEN;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/9] hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
  2021-01-14 15:08 ` [PATCH 1/9] hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes Bin Meng
@ 2021-01-14 15:08 ` Bin Meng
  2021-01-14 15:08 ` [PATCH 3/9] hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes Bin Meng
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:08 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Kevin Wolf, qemu-block, Marcin Krzeminski, Bin Meng, qemu-devel,
	Max Reitz, Joe Komlodi

From: Bin Meng <bin.meng@windriver.com>

Unfortunately the dummy cycle/bytes calculation for Numonyx/Micron
flashes is still wrong, even though there were fixes before that
tried to fix it.

First of all, the default number of dummy cycles is only related to
the SPI protocol mode. For QSPI it is 10, otherwise it is 8.

Secondly, per the datasheet [1], it's clear that in Quad I/O or Dual
I/O mode, the dummy bits show up on 4 or 2 lines.

The tricky part is the standard mode (extended mode). For such mode,
the dummy bits are not like other flashes that they show up on the
same lines as the address bits, but on the same lines as the data
bits, so for a Quad Output Fast Read command (6Bh), the dummy bits
must be sent on all the 4 IO lines. IOW, the total number of dummy
bits depend on the command.

The datasheet does not state crystal clearly how many lines are used
for 6Bh in the standard mode. We may only tell from figure 19 that is
showing the command sequence and interpret that dummy cycles need to
be on 4 lines for 6Bh.

Note as of today, both spi-nor drivers in U-Boot v2021.01 and Linux
v5.10 has the wrong assumption for all flashes that dummy cycle bus
width is the same as the address bits bus width, which is not true
for the Numonyx/Micron flash in the standard mode.

Last if the total number of dummy bits is not multiple of 8, log an
unimplemented message to notify user, and round it up. Right now the
QEMU ssi_transfer() API transfers one byte each time to the flash.
Leaving such as unimplemented will not cause any issue because as of
today both spi-nor drivers in U-Boot and Linux have the assumption
that the total number of dummy bits must be multiple of 8.

[1] https://media-www.micron.com/-/media/client/global/documents/products/
    data-sheet/nor-flash/serial-nor/n25q/n25q_512mb_1ce_3v_65nm.pdf

Fixes: 23af26856606 ("hw/block/m25p80: Fix Numonyx fast read dummy cycle count")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 hw/block/m25p80.c | 62 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 57 insertions(+), 5 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index c947716f99..c8cd12a6d3 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -856,19 +856,71 @@ static uint8_t numonyx_extract_cfg_num_dummies(Flash *s)
     mode = numonyx_mode(s);
     num_dummies = extract32(s->volatile_cfg, 4, 4);
 
+    /*
+     * The default nubmer of dummy cycles is only related to the SPI
+     * protocol mode. For QSPI it is 10, otherwise it is 8.
+     */
     if (num_dummies == 0x0 || num_dummies == 0xf) {
+        num_dummies = (mode == MODE_QIO) ? 10 : 8;
+    }
+
+    /*
+     * Convert the number of dummy cycles to bytes
+     *
+     * Per the datasheet, it's clear that in Quad I/O or Dual I/O mode,
+     * the dummy bits show up on 4 or 2 lines.
+     *
+     * The tricky part is the standard mode (extended mode). For such
+     * mode, the dummy bits are not like other flashes that they show up
+     * on the same lines as the address bits, but on the same lines as
+     * the data bits, so for a Quad Output Fast Read command (6Bh), the
+     * dummy bits must be sent on all the 4 IO lines. IOW, the total
+     * number of dummy bits depend on the command.
+     *
+     * The datasheet does not state crystal clearly how many lines are
+     * used for 6Bh in the standard mode. We may only tell from figure 19
+     * that is showing the command sequence and interpret that dummy cycles
+     * need to be on 4 lines for 6Bh.
+     *
+     * Note as of today, both spi-nor drivers in U-Boot v2021.01 and Linux
+     * v5.10 has the wrong assumption for all flashes that dummy cycle bus
+     * width is the same as the address bits bus width, which is not true
+     * for the Numonyx/Micron flash in the standard mode.
+     */
+
+    if (mode == MODE_QIO) {
+        num_dummies *= 4;
+    } else if (mode == MODE_DIO) {
+        num_dummies *= 2;
+    } else {
         switch (s->cmd_in_progress) {
+        case QOR:
+        case QOR4:
         case QIOR:
         case QIOR4:
-            num_dummies = 10;
+            num_dummies *= 4;
             break;
-        default:
-            num_dummies = (mode == MODE_QIO) ? 10 : 8;
+        case DOR:
+        case DOR4:
+        case DIOR:
+        case DIOR4:
+            num_dummies *= 2;
             break;
-        }
+         }
+    }
+
+    /*
+     * If the total number of dummy bits is not multiple of 8, log an
+     * unimplemented message to notify user, and round it up.
+     */
+    if (num_dummies % 8) {
+        qemu_log_mask(LOG_UNIMP,
+                      "M25P80: the number of dummy bits is not multiple of 8");
+        num_dummies = ROUND_UP(num_dummies, 8);
     }
 
-    return num_dummies;
+    /* return the number of dummy bytes */
+    return num_dummies / 8;
 }
 
 static void decode_fast_read_cmd(Flash *s)
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/9] hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
  2021-01-14 15:08 ` [PATCH 1/9] hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes Bin Meng
  2021-01-14 15:08 ` [PATCH 2/9] hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes Bin Meng
@ 2021-01-14 15:08 ` Bin Meng
  2021-01-14 15:08 ` [PATCH 4/9] hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes Bin Meng
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:08 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Kevin Wolf, qemu-block, Marcin Krzeminski, Bin Meng, qemu-devel,
	Max Reitz

From: Bin Meng <bin.meng@windriver.com>

Per datasheet [1], the number of dummy cycles for Macronix flashes
is configurable via two volatible bits (DC1, DC2) in a configuration
register.

Do the same dummy cycle to dummy byte conversion fix as others.

[1] https://www.macronix.com/Lists/Datasheet/Attachments/7674/MX66U51235F,%201.8V,%20512Mb,%20v1.1.pdf

Fixes: cf6f1efe0b57 ("m25p80: Fast read commands family changes")
Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 hw/block/m25p80.c | 21 +++++++++++----------
 1 file changed, 11 insertions(+), 10 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index c8cd12a6d3..44508b3da9 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -949,10 +949,10 @@ static void decode_fast_read_cmd(Flash *s)
         break;
     case MAN_MACRONIX:
         if (extract32(s->volatile_cfg, 6, 2) == 1) {
-            s->needed_bytes += 6;
-        } else {
-            s->needed_bytes += 8;
+            qemu_log_mask(LOG_UNIMP,
+                          "M25P80: the number of dummy bits is not multiple of 8");
         }
+        s->needed_bytes += 1;
         break;
     case MAN_SPANSION:
         s->needed_bytes += extract32(s->spansion_cr2v,
@@ -989,13 +989,14 @@ static void decode_dio_read_cmd(Flash *s)
     case MAN_MACRONIX:
         switch (extract32(s->volatile_cfg, 6, 2)) {
         case 1:
-            s->needed_bytes += 6;
-            break;
+            qemu_log_mask(LOG_UNIMP,
+                          "M25P80: the number of dummy bits is not multiple of 8");
+        /* fall-through */
         case 2:
-            s->needed_bytes += 8;
+            s->needed_bytes += 2;
             break;
         default:
-            s->needed_bytes += 4;
+            s->needed_bytes += 1;
             break;
         }
         break;
@@ -1028,13 +1029,13 @@ static void decode_qio_read_cmd(Flash *s)
     case MAN_MACRONIX:
         switch (extract32(s->volatile_cfg, 6, 2)) {
         case 1:
-            s->needed_bytes += 4;
+            s->needed_bytes += 2;
             break;
         case 2:
-            s->needed_bytes += 8;
+            s->needed_bytes += 4;
             break;
         default:
-            s->needed_bytes += 6;
+            s->needed_bytes += 3;
             break;
         }
         break;
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/9] hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (2 preceding siblings ...)
  2021-01-14 15:08 ` [PATCH 3/9] hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes Bin Meng
@ 2021-01-14 15:08 ` Bin Meng
  2021-01-14 15:08 ` [PATCH 5/9] hw/block: m25p80: Support fast read for SST flashes Bin Meng
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:08 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Kevin Wolf, qemu-block, Marcin Krzeminski, Bin Meng, qemu-devel,
	Max Reitz

From: Bin Meng <bin.meng@windriver.com>

Per datasheet [1], the number of dummy cycles for Spansion flashes
is configurable via 4 volatible bits in a configuration register.

Do the same dummy cycle to dummy byte conversion fix as others.

[1] https://www.cypress.com/file/316171/download

Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 hw/block/m25p80.c | 43 +++++++++++++++++++++++++++++++------------
 1 file changed, 31 insertions(+), 12 deletions(-)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index 44508b3da9..e1e5d5a76f 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -955,10 +955,25 @@ static void decode_fast_read_cmd(Flash *s)
         s->needed_bytes += 1;
         break;
     case MAN_SPANSION:
-        s->needed_bytes += extract32(s->spansion_cr2v,
-                                    SPANSION_DUMMY_CLK_POS,
-                                    SPANSION_DUMMY_CLK_LEN
-                                    );
+        if (extract32(s->spansion_cr2v, SPANSION_DUMMY_CLK_POS,
+                      SPANSION_DUMMY_CLK_LEN) != 8) {
+            qemu_log_mask(LOG_UNIMP,
+                          "M25P80: the number of dummy bits is not multiple of 8");
+        }
+        switch (s->cmd_in_progress) {
+        case FAST_READ:
+        case FAST_READ4:
+            s->needed_bytes += 1;
+            break;
+        case DOR:
+        case DOR4:
+            s->needed_bytes += 2;
+            break;
+        case QOR:
+        case QOR4:
+            s->needed_bytes += 4;
+            break;
+        }
         break;
     default:
         break;
@@ -978,10 +993,12 @@ static void decode_dio_read_cmd(Flash *s)
         break;
     case MAN_SPANSION:
         s->needed_bytes += SPANSION_CONTINUOUS_READ_MODE_CMD_LEN;
-        s->needed_bytes += extract32(s->spansion_cr2v,
-                                    SPANSION_DUMMY_CLK_POS,
-                                    SPANSION_DUMMY_CLK_LEN
-                                    );
+        if (extract32(s->spansion_cr2v, SPANSION_DUMMY_CLK_POS,
+                      SPANSION_DUMMY_CLK_LEN) != 8) {
+            qemu_log_mask(LOG_UNIMP,
+                          "M25P80: the number of dummy bits is not multiple of 8");
+        }
+        s->needed_bytes += 2;
         break;
     case MAN_NUMONYX:
         s->needed_bytes += numonyx_extract_cfg_num_dummies(s);
@@ -1018,10 +1035,12 @@ static void decode_qio_read_cmd(Flash *s)
         break;
     case MAN_SPANSION:
         s->needed_bytes += SPANSION_CONTINUOUS_READ_MODE_CMD_LEN;
-        s->needed_bytes += extract32(s->spansion_cr2v,
-                                    SPANSION_DUMMY_CLK_POS,
-                                    SPANSION_DUMMY_CLK_LEN
-                                    );
+        if (extract32(s->spansion_cr2v, SPANSION_DUMMY_CLK_POS,
+                      SPANSION_DUMMY_CLK_LEN) != 8) {
+            qemu_log_mask(LOG_UNIMP,
+                          "M25P80: the number of dummy bits is not multiple of 8");
+        }
+        s->needed_bytes += 4;
         break;
     case MAN_NUMONYX:
         s->needed_bytes += numonyx_extract_cfg_num_dummies(s);
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/9] hw/block: m25p80: Support fast read for SST flashes
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (3 preceding siblings ...)
  2021-01-14 15:08 ` [PATCH 4/9] hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes Bin Meng
@ 2021-01-14 15:08 ` Bin Meng
  2021-01-14 15:08 ` [PATCH 6/9] hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling Bin Meng
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:08 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Kevin Wolf, Bin Meng, qemu-devel, qemu-block, Max Reitz

From: Bin Meng <bin.meng@windriver.com>

Per SST25VF016B datasheet [1], SST flash requires a dummy byte after
the address bytes. Note only SPI mode is supported by SST flashes.

[1] http://ww1.microchip.com/downloads/en/devicedoc/s71271_04.pdf

Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 hw/block/m25p80.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/hw/block/m25p80.c b/hw/block/m25p80.c
index e1e5d5a76f..512af61ba5 100644
--- a/hw/block/m25p80.c
+++ b/hw/block/m25p80.c
@@ -928,6 +928,9 @@ static void decode_fast_read_cmd(Flash *s)
     s->needed_bytes = get_addr_length(s);
     switch (get_man(s)) {
     /* Dummy cycles - modeled with bytes writes */
+    case MAN_SST:
+        s->needed_bytes += 1;
+        break;
     case MAN_WINBOND:
         switch (s->cmd_in_progress) {
         case FAST_READ:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/9] hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (4 preceding siblings ...)
  2021-01-14 15:08 ` [PATCH 5/9] hw/block: m25p80: Support fast read for SST flashes Bin Meng
@ 2021-01-14 15:08 ` Bin Meng
  2021-01-14 15:09 ` [PATCH 7/9] Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command" Bin Meng
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:08 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Xuzhou Cheng, Bin Meng, qemu-devel, qemu-arm

From: Bin Meng <bin.meng@windriver.com>

The description of the genenic command fifo register says:

  When [receive, transmit, data_xfer] = [0,0,1], the [immediate_data]
  field represents the number of dummy cycle sent on the SPI interface.

However we should not simply use the programmed value to determine
how many times ssi_transfer() needs to be called to send the dummy
bytes. ssi_transfer() is used to transfer a byte on the line, not
a sigle bit. Previously the m25p80 flash model wronly implemented
the dummy cycles for fast read command on some flashes. Now this
mess is corrected and SPI flash controllers need to be updated to
do the right thing.

According to the example in the ZynqMP manual (ug1085, v2.2 [1])
we need to convert the number of dummy cycles to bytes according to
the SPI mode being used, and transfer the bytes via ssi_transfer().

[1] https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
    table 24‐22, an example of Generic FIFO Contents for Quad I/O Read Command (EBh)

Fixes: c95997a39de6 ("xilinx_spips: Add support for the ZynqMP Generic QSPI")
Signed-off-by: Xuzhou Cheng <xuzhou.cheng@windriver.com>
Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 hw/ssi/xilinx_spips.c | 29 ++++++++++++++++++++++++++++-
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/hw/ssi/xilinx_spips.c b/hw/ssi/xilinx_spips.c
index a897034601..787de60f24 100644
--- a/hw/ssi/xilinx_spips.c
+++ b/hw/ssi/xilinx_spips.c
@@ -191,6 +191,10 @@
     FIELD(GQSPI_GF_SNAPSHOT, EXPONENT, 9, 1)
     FIELD(GQSPI_GF_SNAPSHOT, DATA_XFER, 8, 1)
     FIELD(GQSPI_GF_SNAPSHOT, IMMEDIATE_DATA, 0, 8)
+#define GQSPI_GF_MODE_SPI     1
+#define GQSPI_GF_MODE_DSPI    2
+#define GQSPI_GF_MODE_QSPI    3
+
 #define R_GQSPI_MOD_ID        (0x1fc / 4)
 #define R_GQSPI_MOD_ID_RESET  (0x10a0000)
 
@@ -492,7 +496,30 @@ static void xlnx_zynqmp_qspips_flush_fifo_g(XlnxZynqMPQSPIPS *s)
                 }
                 s->regs[R_GQSPI_DATA_STS] = 1ul << imm;
             } else {
-                s->regs[R_GQSPI_DATA_STS] = imm;
+                /*
+                 * When [receive, transmit, data_xfer] = [0,0,1], it represents
+                 * the number of dummy cycle sent on the SPI interface. We need
+                 * to convert the number of dummy cycles to bytes according to
+                 * the SPI mode being used.
+                 *
+                 * Ref: ug1085 v2.2 (December 2020) table 24‐22, an example of
+                 *      Generic FIFO Contents for Quad I/O Read Command (EBh)
+                 */
+                if (!ARRAY_FIELD_EX32(s->regs, GQSPI_GF_SNAPSHOT, TRANSMIT) &&
+                    !ARRAY_FIELD_EX32(s->regs, GQSPI_GF_SNAPSHOT, RECIEVE)) {
+                    uint8_t spi_mode = ARRAY_FIELD_EX32(s->regs, GQSPI_GF_SNAPSHOT, SPI_MODE);
+                    if (spi_mode == GQSPI_GF_MODE_QSPI) {
+                        s->regs[R_GQSPI_DATA_STS] = ROUND_UP(imm * 4, 8) / 8;
+                    } else if (spi_mode == GQSPI_GF_MODE_DSPI) {
+                        s->regs[R_GQSPI_DATA_STS] = ROUND_UP(imm * 2, 8) / 8;
+                    } else if (spi_mode == GQSPI_GF_MODE_SPI) {
+                        s->regs[R_GQSPI_DATA_STS] = ROUND_UP(imm * 1, 8) / 8;
+                    } else {
+                        qemu_log_mask(LOG_GUEST_ERROR, "Unknown SPI MODE: 0x%x ", spi_mode);
+                    }
+                } else {
+                    s->regs[R_GQSPI_DATA_STS] = imm;
+                }
             }
         }
         /* Zero length transfer check */
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/9] Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command"
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (5 preceding siblings ...)
  2021-01-14 15:08 ` [PATCH 6/9] hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling Bin Meng
@ 2021-01-14 15:09 ` Bin Meng
  2021-01-14 15:09 ` [PATCH 8/9] Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles" Bin Meng
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:09 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Andrew Jeffery, Bin Meng, qemu-devel, qemu-arm,
	Cédric Le Goater, Joel Stanley

From: Bin Meng <bin.meng@windriver.com>

This reverts commit 7faf6f1790dddf9f3acf6ddd95f7bbc1b4a755d0.

The incorrect implementation of dummy cycles in m25p80 model is now
corrected. Revert this commit.

Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 hw/ssi/aspeed_smc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
index 16addee4dc..1e78b5232f 100644
--- a/hw/ssi/aspeed_smc.c
+++ b/hw/ssi/aspeed_smc.c
@@ -802,11 +802,11 @@ static int aspeed_smc_num_dummies(uint8_t command)
     case FAST_READ:
     case DOR:
     case QOR:
-    case FAST_READ_4:
     case DOR_4:
     case QOR_4:
         return 1;
     case DIOR:
+    case FAST_READ_4:
     case DIOR_4:
         return 2;
     case QIOR:
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 8/9] Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (6 preceding siblings ...)
  2021-01-14 15:09 ` [PATCH 7/9] Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command" Bin Meng
@ 2021-01-14 15:09 ` Bin Meng
  2021-01-14 15:09 ` [PATCH 9/9] hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic Bin Meng
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:09 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Andrew Jeffery, Bin Meng, qemu-devel, qemu-arm,
	Cédric Le Goater, Joel Stanley

From: Bin Meng <bin.meng@windriver.com>

This reverts commit f95c4bffdc4c53b29f89762cab4adc5a43f95daf.

The incorrect implementation of dummy cycles in m25p80 model is now
corrected. Revert this commit.

Signed-off-by: Bin Meng <bin.meng@windriver.com>
---

 include/hw/ssi/aspeed_smc.h |   3 -
 hw/ssi/aspeed_smc.c         | 116 +-----------------------------------
 2 files changed, 2 insertions(+), 117 deletions(-)

diff --git a/include/hw/ssi/aspeed_smc.h b/include/hw/ssi/aspeed_smc.h
index 16c03fe64f..46f3abf2e7 100644
--- a/include/hw/ssi/aspeed_smc.h
+++ b/include/hw/ssi/aspeed_smc.h
@@ -111,9 +111,6 @@ struct AspeedSMCState {
     AddressSpace dram_as;
 
     AspeedSMCFlash *flashes;
-
-    uint8_t snoop_index;
-    uint8_t snoop_dummies;
 };
 
 #endif /* ASPEED_SMC_H */
diff --git a/hw/ssi/aspeed_smc.c b/hw/ssi/aspeed_smc.c
index 1e78b5232f..0df5d91d19 100644
--- a/hw/ssi/aspeed_smc.c
+++ b/hw/ssi/aspeed_smc.c
@@ -187,9 +187,6 @@
 /* Flash opcodes. */
 #define SPI_OP_READ       0x03    /* Read data bytes (low frequency) */
 
-#define SNOOP_OFF         0xFF
-#define SNOOP_START       0x0
-
 /*
  * Default segments mapping addresses and size for each peripheral per
  * controller. These can be changed when board is initialized with the
@@ -771,104 +768,6 @@ static uint64_t aspeed_smc_flash_read(void *opaque, hwaddr addr, unsigned size)
     return ret;
 }
 
-/*
- * TODO (clg@kaod.org): stolen from xilinx_spips.c. Should move to a
- * common include header.
- */
-typedef enum {
-    READ = 0x3,         READ_4 = 0x13,
-    FAST_READ = 0xb,    FAST_READ_4 = 0x0c,
-    DOR = 0x3b,         DOR_4 = 0x3c,
-    QOR = 0x6b,         QOR_4 = 0x6c,
-    DIOR = 0xbb,        DIOR_4 = 0xbc,
-    QIOR = 0xeb,        QIOR_4 = 0xec,
-
-    PP = 0x2,           PP_4 = 0x12,
-    DPP = 0xa2,
-    QPP = 0x32,         QPP_4 = 0x34,
-} FlashCMD;
-
-static int aspeed_smc_num_dummies(uint8_t command)
-{
-    switch (command) { /* check for dummies */
-    case READ: /* no dummy bytes/cycles */
-    case PP:
-    case DPP:
-    case QPP:
-    case READ_4:
-    case PP_4:
-    case QPP_4:
-        return 0;
-    case FAST_READ:
-    case DOR:
-    case QOR:
-    case DOR_4:
-    case QOR_4:
-        return 1;
-    case DIOR:
-    case FAST_READ_4:
-    case DIOR_4:
-        return 2;
-    case QIOR:
-    case QIOR_4:
-        return 4;
-    default:
-        return -1;
-    }
-}
-
-static bool aspeed_smc_do_snoop(AspeedSMCFlash *fl,  uint64_t data,
-                                unsigned size)
-{
-    AspeedSMCState *s = fl->controller;
-    uint8_t addr_width = aspeed_smc_flash_is_4byte(fl) ? 4 : 3;
-
-    trace_aspeed_smc_do_snoop(fl->id, s->snoop_index, s->snoop_dummies,
-                              (uint8_t) data & 0xff);
-
-    if (s->snoop_index == SNOOP_OFF) {
-        return false; /* Do nothing */
-
-    } else if (s->snoop_index == SNOOP_START) {
-        uint8_t cmd = data & 0xff;
-        int ndummies = aspeed_smc_num_dummies(cmd);
-
-        /*
-         * No dummy cycles are expected with the current command. Turn
-         * off snooping and let the transfer proceed normally.
-         */
-        if (ndummies <= 0) {
-            s->snoop_index = SNOOP_OFF;
-            return false;
-        }
-
-        s->snoop_dummies = ndummies * 8;
-
-    } else if (s->snoop_index >= addr_width + 1) {
-
-        /* The SPI transfer has reached the dummy cycles sequence */
-        for (; s->snoop_dummies; s->snoop_dummies--) {
-            ssi_transfer(s->spi, s->regs[R_DUMMY_DATA] & 0xff);
-        }
-
-        /* If no more dummy cycles are expected, turn off snooping */
-        if (!s->snoop_dummies) {
-            s->snoop_index = SNOOP_OFF;
-        } else {
-            s->snoop_index += size;
-        }
-
-        /*
-         * Dummy cycles have been faked already. Ignore the current
-         * SPI transfer
-         */
-        return true;
-    }
-
-    s->snoop_index += size;
-    return false;
-}
-
 static void aspeed_smc_flash_write(void *opaque, hwaddr addr, uint64_t data,
                                    unsigned size)
 {
@@ -887,10 +786,6 @@ static void aspeed_smc_flash_write(void *opaque, hwaddr addr, uint64_t data,
 
     switch (aspeed_smc_flash_mode(fl)) {
     case CTRL_USERMODE:
-        if (aspeed_smc_do_snoop(fl, data, size)) {
-            break;
-        }
-
         for (i = 0; i < size; i++) {
             ssi_transfer(s->spi, (data >> (8 * i)) & 0xff);
         }
@@ -937,8 +832,6 @@ static void aspeed_smc_flash_update_ctrl(AspeedSMCFlash *fl, uint32_t value)
 
     s->regs[s->r_ctrl0 + fl->id] = value;
 
-    s->snoop_index = unselect ? SNOOP_OFF : SNOOP_START;
-
     aspeed_smc_flash_do_select(fl, unselect);
 }
 
@@ -981,9 +874,6 @@ static void aspeed_smc_reset(DeviceState *d)
     if (s->ctrl->segments == aspeed_segments_fmc) {
         s->regs[s->r_conf] |= (CONF_FLASH_TYPE_SPI << CONF_FLASH_TYPE0);
     }
-
-    s->snoop_index = SNOOP_OFF;
-    s->snoop_dummies = 0;
 }
 
 static uint64_t aspeed_smc_read(void *opaque, hwaddr addr, unsigned int size)
@@ -1419,12 +1309,10 @@ static void aspeed_smc_realize(DeviceState *dev, Error **errp)
 
 static const VMStateDescription vmstate_aspeed_smc = {
     .name = "aspeed.smc",
-    .version_id = 2,
-    .minimum_version_id = 2,
+    .version_id = 1,
+    .minimum_version_id = 1,
     .fields = (VMStateField[]) {
         VMSTATE_UINT32_ARRAY(regs, AspeedSMCState, ASPEED_SMC_R_MAX),
-        VMSTATE_UINT8(snoop_index, AspeedSMCState),
-        VMSTATE_UINT8(snoop_dummies, AspeedSMCState),
         VMSTATE_END_OF_LIST()
     }
 };
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 9/9] hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (7 preceding siblings ...)
  2021-01-14 15:09 ` [PATCH 8/9] Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles" Bin Meng
@ 2021-01-14 15:09 ` Bin Meng
  2021-01-14 17:12   ` Havard Skinnemoen via
  2021-01-14 15:59 ` [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Cédric Le Goater
                   ` (2 subsequent siblings)
  11 siblings, 1 reply; 36+ messages in thread
From: Bin Meng @ 2021-01-14 15:09 UTC (permalink / raw)
  To: Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Tyrone Ting, Bin Meng, Havard Skinnemoen, qemu-arm, qemu-devel

From: Bin Meng <bin.meng@windriver.com>

I believe send_dummy_bits() should also be fixed, but I really don't
know how based on my pure read/guess of the codes since there is no
public datasheet available for this NPCM7xx SoC.

Signed-off-by: Bin Meng <bin.meng@windriver.com>

---

 hw/ssi/npcm7xx_fiu.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

diff --git a/hw/ssi/npcm7xx_fiu.c b/hw/ssi/npcm7xx_fiu.c
index 5040132b07..e76fb5ad9f 100644
--- a/hw/ssi/npcm7xx_fiu.c
+++ b/hw/ssi/npcm7xx_fiu.c
@@ -150,7 +150,7 @@ static uint64_t npcm7xx_fiu_flash_read(void *opaque, hwaddr addr,
     NPCM7xxFIUState *fiu = f->fiu;
     uint64_t value = 0;
     uint32_t drd_cfg;
-    int dummy_cycles;
+    int dummy_bytes;
     int i;
 
     if (fiu->active_cs != -1) {
@@ -180,10 +180,8 @@ static uint64_t npcm7xx_fiu_flash_read(void *opaque, hwaddr addr,
         break;
     }
 
-    /* Flash chip model expects one transfer per dummy bit, not byte */
-    dummy_cycles =
-        (FIU_DRD_CFG_DBW(drd_cfg) * 8) >> FIU_DRD_CFG_ACCTYPE(drd_cfg);
-    for (i = 0; i < dummy_cycles; i++) {
+    dummy_bytes = FIU_DRD_CFG_DBW(drd_cfg) >> FIU_DRD_CFG_ACCTYPE(drd_cfg);
+    for (i = 0; i < dummy_bytes; i++) {
         ssi_transfer(fiu->spi, 0);
     }
 
-- 
2.25.1



^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (8 preceding siblings ...)
  2021-01-14 15:09 ` [PATCH 9/9] hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic Bin Meng
@ 2021-01-14 15:59 ` Cédric Le Goater
  2021-01-14 16:12 ` no-reply
  2021-01-14 18:13 ` Francisco Iglesias
  11 siblings, 0 replies; 36+ messages in thread
From: Cédric Le Goater @ 2021-01-14 15:59 UTC (permalink / raw)
  To: Bin Meng, Alistair Francis, Philippe Mathieu-Daudé,
	Peter Maydell, Francisco Iglesias
  Cc: Kevin Wolf, qemu-devel, qemu-block, Marcin Krzeminski,
	Andrew Jeffery, Bin Meng, Havard Skinnemoen, Max Reitz,
	Tyrone Ting, qemu-arm, Joel Stanley, Joe Komlodi

On 1/14/21 4:08 PM, Bin Meng wrote:
> From: Bin Meng <bin.meng@windriver.com>
> 
> The m25p80 model uses s->needed_bytes to indicate how many follow-up
> bytes are expected to be received after it receives a command. For
> example, depending on the address mode, either 3-byte address or
> 4-byte address is needed.
> 
> For fast read family commands, some dummy cycles are required after
> sending the address bytes, and the dummy cycles need to be counted
> in s->needed_bytes. This is where the mess began.
> 
> As the variable name (needed_bytes) indicates, the unit is in byte.
> It is not in bit, or cycle. However for some reason the model has
> been using the number of dummy cycles for s->needed_bytes. The right
> approach is to convert the number of dummy cycles to bytes based on
> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> 
> Things get complicated when interacting with different SPI or QSPI
> flash controllers. There are major two cases:
> 
> - Dummy bytes prepared by drivers, and wrote to the controller fifo.
>   For such case, driver will calculate the correct number of dummy
>   bytes and write them into the tx fifo. Fixing the m25p80 model will
>   fix flashes working with such controllers.
> - Dummy bytes not prepared by drivers. Drivers just tell the hardware
>   the dummy cycle configuration via some registers, and hardware will
>   automatically generate dummy cycles for us. Fixing the m25p80 model
>   is not enough, and we will need to fix the SPI/QSPI models for such
>   controllers.
> 
> This series fixes the mess in the m25p80 from the flash side first,
> followed by fixes to 3 known SPI controller models that fall into
> the 2nd case above.
> 
> Please note, I have no way to verify patch 7/8/9 because:
> 
> * There is no public datasheet available for the SoC / SPI controller
> * There is no QEMU docs, or details that tell people how to boot either
>   U-Boot or Linux kernel to verify the functionality

The Linux drivers are available in mainline but these branches are more 
up to date since not everything is merged :

  https://github.com/openbmc/linux

u-boot : 

  https://github.com/openbmc/u-boot/tree/v2016.07-aspeed-openbmc (ast2400/ast2500)
  https://github.com/openbmc/u-boot/tree/v2019.04-aspeed-openbmc (ast2600)

A quick intro : 

  https://www.qemu.org/docs/master/system/arm/aspeed.html

> 
> These 3 patches are very likely to be wrong. Hence I would like to ask
> help from the original author who wrote these SPI controller models
> to help testing, or completely rewrite these 3 patches to fix things.
> Thanks!

A quick test shows that all Aspeed machines are broken with this patchset.

Please try these command lines : 

  wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=palmetto/artifact/deploy/images/palmetto/flash-palmetto
  wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=romulus/artifact/deploy/images/romulus/flash-romulus
  wget https://openpower.xyz/job/openbmc-build/lastSuccessfulBuild/distro=ubuntu,label=builder,target=witherspoon/artifact/deploy/images/witherspoon/obmc-phosphor-image-witherspoon.ubi.mtd

  qemu-system-arm -M witherspoon-bmc -nic user -drive file=obmc-phosphor-image-witherspoon.ubi.mtd,format=raw,if=mtd -nographic
  qemu-system-arm -M romulus-bmc -nic user -drive file=flash-romulus,format=raw,if=mtd -nographic
  qemu-system-arm -M palmetto-bmc -nic user -drive file=flash-palmetto,format=raw,if=mtd -nographic

The Aspeed SMC model has traces to help you in the task.

Thanks,

C. 
 
> Patch 6 is unvalidated with QEMU, mainly because there is no doc to
> tell people how to boot anything to test. But I have some confidence
> based on my read of the ZynqMP manual, as well as some experimental
> testing on a real ZCU102 board.
> 
> Other flash patches can be tested with the SiFive SPI series:
> http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391
> 
> Cherry-pick patch 16 and 17 from the series above, and switch to
> different flash model to test with the following command:
> 
> $ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot
> 
> I've picked up two for testing:
> 
> QEMU flash: "sst25vf032b"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB,
>   total 4 MiB
>   => sf test 1ff000 1000
>   SPI flash test:
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
>   Test passed
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
> 
> QEMU flash: "mx66u51235f"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   => sf test 0 8000
>   SPI flash test:
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
>   Test passed
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
> 
> I am sure there will be bugs, and I have not tested all flashes affected.
> But I want to send out this series for an early discussion and comments.
> I will continue my testing.
> 
> 
> Bin Meng (9):
>   hw/block: m25p80: Fix the number of dummy bytes needed for Windbond
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for
>     Numonyx/Micron flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Macronix
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Spansion
>     flashes
>   hw/block: m25p80: Support fast read for SST flashes
>   hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
>   Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4
>     command"
>   Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
>   hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
> 
>  include/hw/ssi/aspeed_smc.h |   3 -
>  hw/block/m25p80.c           | 153 ++++++++++++++++++++++++++++--------
>  hw/ssi/aspeed_smc.c         | 116 +--------------------------
>  hw/ssi/npcm7xx_fiu.c        |   8 +-
>  hw/ssi/xilinx_spips.c       |  29 ++++++-
>  5 files changed, 153 insertions(+), 156 deletions(-)
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (9 preceding siblings ...)
  2021-01-14 15:59 ` [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Cédric Le Goater
@ 2021-01-14 16:12 ` no-reply
  2021-01-14 18:13 ` Francisco Iglesias
  11 siblings, 0 replies; 36+ messages in thread
From: no-reply @ 2021-01-14 16:12 UTC (permalink / raw)
  To: bmeng.cn
  Cc: kwolf, peter.maydell, qemu-block, marcin.krzeminski, andrew,
	frasse.iglesias, bin.meng, qemu-devel, f4bug, kfting, qemu-arm,
	alistair.francis, clg, komlodi, hskinnemoen, mreitz, joel

Patchew URL: https://patchew.org/QEMU/20210114150902.11515-1-bmeng.cn@gmail.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Type: series
Message-id: 20210114150902.11515-1-bmeng.cn@gmail.com
Subject: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 - [tag update]      patchew/20210114013147.92962-1-jiaxun.yang@flygoat.com -> patchew/20210114013147.92962-1-jiaxun.yang@flygoat.com
 * [new tag]         patchew/20210114150902.11515-1-bmeng.cn@gmail.com -> patchew/20210114150902.11515-1-bmeng.cn@gmail.com
Switched to a new branch 'test'
b87aded hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
4518be2 Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
6a4067a Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command"
e5ea744 hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
3294942 hw/block: m25p80: Support fast read for SST flashes
50a7f9f hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes
cf6f8e1 hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes
3925fcf hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes
5344168 hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes

=== OUTPUT BEGIN ===
1/9 Checking commit 5344168de433 (hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes)
2/9 Checking commit 3925fcf79dbc (hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes)
3/9 Checking commit cf6f8e145faa (hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes)
4/9 Checking commit 50a7f9fb909b (hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes)
5/9 Checking commit 3294942ca3a1 (hw/block: m25p80: Support fast read for SST flashes)
6/9 Checking commit e5ea74473d87 (hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling)
ERROR: line over 90 characters
#63: FILE: hw/ssi/xilinx_spips.c:510:
+                    uint8_t spi_mode = ARRAY_FIELD_EX32(s->regs, GQSPI_GF_SNAPSHOT, SPI_MODE);

ERROR: line over 90 characters
#71: FILE: hw/ssi/xilinx_spips.c:518:
+                        qemu_log_mask(LOG_GUEST_ERROR, "Unknown SPI MODE: 0x%x ", spi_mode);

total: 2 errors, 0 warnings, 41 lines checked

Patch 6/9 has style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.

7/9 Checking commit 6a4067a6a9fc (Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command")
8/9 Checking commit 4518be22e1c9 (Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles")
9/9 Checking commit b87aded6dc2a (hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic)
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20210114150902.11515-1-bmeng.cn@gmail.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-devel@redhat.com

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 9/9] hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
  2021-01-14 15:09 ` [PATCH 9/9] hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic Bin Meng
@ 2021-01-14 17:12   ` Havard Skinnemoen via
  0 siblings, 0 replies; 36+ messages in thread
From: Havard Skinnemoen via @ 2021-01-14 17:12 UTC (permalink / raw)
  To: Bin Meng
  Cc: Peter Maydell, Francisco Iglesias, Bin Meng,
	Philippe Mathieu-Daudé,
	QEMU Developers, Tyrone Ting, qemu-arm, Alistair Francis

On Thu, Jan 14, 2021 at 7:10 AM Bin Meng <bmeng.cn@gmail.com> wrote:
>
> From: Bin Meng <bin.meng@windriver.com>
>
> I believe send_dummy_bits() should also be fixed, but I really don't
> know how based on my pure read/guess of the codes since there is no
> public datasheet available for this NPCM7xx SoC.
>
> Signed-off-by: Bin Meng <bin.meng@windriver.com>

Just a quick comment before I look at the rest of the patch series:
The emulated dummy bits behavior has a lot more to do with what the
m25p80 emulator seemed to expect than the actual NPCM7xx behavior. If
the m25p behavior now interprets the dummy cycles the same way as the
rest of the cycles, this change seems correct, but you're right that
send_dummy_bits probably needs some attention as well.

I _think_ it's just a matter of turning this:

        for (j = 0; j < 8; j += bits_per_clock) {
            ssi_transfer(spi, extract32(uma_cmd, field + j, bits_per_clock));
        }

into this:

        ssi_transfer(spi, extract32(uma_cmd, field, BITS_PER_BYTE));

which might have the very nice side effect of speeding up SPI flash
access quite a bit.

Thanks a lot for looking into this.

>
> ---
>
>  hw/ssi/npcm7xx_fiu.c | 8 +++-----
>  1 file changed, 3 insertions(+), 5 deletions(-)
>
> diff --git a/hw/ssi/npcm7xx_fiu.c b/hw/ssi/npcm7xx_fiu.c
> index 5040132b07..e76fb5ad9f 100644
> --- a/hw/ssi/npcm7xx_fiu.c
> +++ b/hw/ssi/npcm7xx_fiu.c
> @@ -150,7 +150,7 @@ static uint64_t npcm7xx_fiu_flash_read(void *opaque, hwaddr addr,
>      NPCM7xxFIUState *fiu = f->fiu;
>      uint64_t value = 0;
>      uint32_t drd_cfg;
> -    int dummy_cycles;
> +    int dummy_bytes;
>      int i;
>
>      if (fiu->active_cs != -1) {
> @@ -180,10 +180,8 @@ static uint64_t npcm7xx_fiu_flash_read(void *opaque, hwaddr addr,
>          break;
>      }
>
> -    /* Flash chip model expects one transfer per dummy bit, not byte */
> -    dummy_cycles =
> -        (FIU_DRD_CFG_DBW(drd_cfg) * 8) >> FIU_DRD_CFG_ACCTYPE(drd_cfg);
> -    for (i = 0; i < dummy_cycles; i++) {
> +    dummy_bytes = FIU_DRD_CFG_DBW(drd_cfg) >> FIU_DRD_CFG_ACCTYPE(drd_cfg);
> +    for (i = 0; i < dummy_bytes; i++) {
>          ssi_transfer(fiu->spi, 0);
>      }
>
> --
> 2.25.1
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
                   ` (10 preceding siblings ...)
  2021-01-14 16:12 ` no-reply
@ 2021-01-14 18:13 ` Francisco Iglesias
  2021-01-15  2:07   ` Bin Meng
  11 siblings, 1 reply; 36+ messages in thread
From: Francisco Iglesias @ 2021-01-14 18:13 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel, qemu-block,
	Marcin Krzeminski, Andrew Jeffery, Bin Meng,
	Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Bin,

On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> From: Bin Meng <bin.meng@windriver.com>
> 
> The m25p80 model uses s->needed_bytes to indicate how many follow-up
> bytes are expected to be received after it receives a command. For
> example, depending on the address mode, either 3-byte address or
> 4-byte address is needed.
> 
> For fast read family commands, some dummy cycles are required after
> sending the address bytes, and the dummy cycles need to be counted
> in s->needed_bytes. This is where the mess began.
> 
> As the variable name (needed_bytes) indicates, the unit is in byte.
> It is not in bit, or cycle. However for some reason the model has
> been using the number of dummy cycles for s->needed_bytes. The right
> approach is to convert the number of dummy cycles to bytes based on
> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).

While not being the original implementor I must assume that above solution was
considered but not chosen by the developers due to it is inaccuracy (it
wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
meaning that if the controller is wrongly programmed to generate 7 the error
wouldn't be caught and the controller will still be considered "correct"). Now
that we have this detail in the implementation I'm in favor of keeping it, this
also because the detail is already in use for catching exactly above error.

> 
> Things get complicated when interacting with different SPI or QSPI
> flash controllers. There are major two cases:
> 
> - Dummy bytes prepared by drivers, and wrote to the controller fifo.
>   For such case, driver will calculate the correct number of dummy
>   bytes and write them into the tx fifo. Fixing the m25p80 model will
>   fix flashes working with such controllers.

Above can be fixed while still keeping the detailed dummy cycle implementation
inside m25p80. Perhaps one of the following could be looked into: configurating
the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
some functionality handling this in the SPI controller. Or a mixture of above.

> - Dummy bytes not prepared by drivers. Drivers just tell the hardware
>   the dummy cycle configuration via some registers, and hardware will
>   automatically generate dummy cycles for us. Fixing the m25p80 model
>   is not enough, and we will need to fix the SPI/QSPI models for such
>   controllers.
> 
> This series fixes the mess in the m25p80 from the flash side first,

Considering the problems solved by the solution in tree I find m25p80 pretty
clean, at least I don't see any clearly better way for accurately modeling the
dummy clock cycles. Counting bits instead of bytes would for example still
force the controllers to mark which bits to count (when transmitting one dummy
byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
cycles since it takes two cycles to transfer 8 bits).

Best regards,
Francisco Iglesias


> followed by fixes to 3 known SPI controller models that fall into
> the 2nd case above.
> 
> Please note, I have no way to verify patch 7/8/9 because:
> 
> * There is no public datasheet available for the SoC / SPI controller
> * There is no QEMU docs, or details that tell people how to boot either
>   U-Boot or Linux kernel to verify the functionality
> 
> These 3 patches are very likely to be wrong. Hence I would like to ask
> help from the original author who wrote these SPI controller models
> to help testing, or completely rewrite these 3 patches to fix things.
> Thanks!
> 
> Patch 6 is unvalidated with QEMU, mainly because there is no doc to
> tell people how to boot anything to test. But I have some confidence
> based on my read of the ZynqMP manual, as well as some experimental
> testing on a real ZCU102 board.
> 
> Other flash patches can be tested with the SiFive SPI series:
> http://patchwork.ozlabs.org/project/qemu-devel/list/?series=222391
> 
> Cherry-pick patch 16 and 17 from the series above, and switch to
> different flash model to test with the following command:
> 
> $ qemu-system-riscv64 -nographic -M sifive_u -m 2G -smp 5 -kernel u-boot
> 
> I've picked up two for testing:
> 
> QEMU flash: "sst25vf032b"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB, total 4 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected sst25vf032b with page size 256 Bytes, erase size 4 KiB,
>   total 4 MiB
>   => sf test 1ff000 1000
>   SPI flash test:
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
>   Test passed
>   0 erase: 0 ticks, 4096000 KiB/s 32768.000 Mbps
>   1 check: 10 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 170 ticks, 23 KiB/s 0.184 Mbps
>   3 read: 9 ticks, 444 KiB/s 3.552 Mbps
> 
> QEMU flash: "mx66u51235f"
> 
>   U-Boot 2020.10 (Jan 14 2021 - 21:55:59 +0800)
> 
>   CPU:   rv64imafdcsu
>   Model: SiFive HiFive Unleashed A00
>   DRAM:  2 GiB
>   MMC:
>   Loading Environment from SPIFlash... SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   *** Warning - bad CRC, using default environment
> 
>   In:    serial@10010000
>   Out:   serial@10010000
>   Err:   serial@10010000
>   Net:   failed to get gemgxl_reset reset
> 
>   Warning: ethernet@10090000 MAC addresses don't match:
>   Address in DT is                52:54:00:12:34:56
>   Address in environment is       70:b3:d5:92:f0:01
>   eth0: ethernet@10090000
>   Hit any key to stop autoboot:  0
>   => sf probe
>   SF: Detected mx66u51235f with page size 256 Bytes, erase size 4 KiB, total 64 MiB
>   => sf test 0 8000
>   SPI flash test:
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
>   Test passed
>   0 erase: 1 ticks, 32000 KiB/s 256.000 Mbps
>   1 check: 80 ticks, 400 KiB/s 3.200 Mbps
>   2 write: 83 ticks, 385 KiB/s 3.080 Mbps
>   3 read: 79 ticks, 405 KiB/s 3.240 Mbps
> 
> I am sure there will be bugs, and I have not tested all flashes affected.
> But I want to send out this series for an early discussion and comments.
> I will continue my testing.
> 
> 
> Bin Meng (9):
>   hw/block: m25p80: Fix the number of dummy bytes needed for Windbond
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for
>     Numonyx/Micron flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Macronix
>     flashes
>   hw/block: m25p80: Fix the number of dummy bytes needed for Spansion
>     flashes
>   hw/block: m25p80: Support fast read for SST flashes
>   hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling
>   Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4
>     command"
>   Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles"
>   hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic
> 
>  include/hw/ssi/aspeed_smc.h |   3 -
>  hw/block/m25p80.c           | 153 ++++++++++++++++++++++++++++--------
>  hw/ssi/aspeed_smc.c         | 116 +--------------------------
>  hw/ssi/npcm7xx_fiu.c        |   8 +-
>  hw/ssi/xilinx_spips.c       |  29 ++++++-
>  5 files changed, 153 insertions(+), 156 deletions(-)
> 
> -- 
> 2.25.1
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-14 18:13 ` Francisco Iglesias
@ 2021-01-15  2:07   ` Bin Meng
  2021-01-15  3:29     ` Havard Skinnemoen via
  2021-01-15 12:26     ` Francisco Iglesias
  0 siblings, 2 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-15  2:07 UTC (permalink / raw)
  To: Francisco Iglesias
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Francisco,

On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > From: Bin Meng <bin.meng@windriver.com>
> >
> > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > bytes are expected to be received after it receives a command. For
> > example, depending on the address mode, either 3-byte address or
> > 4-byte address is needed.
> >
> > For fast read family commands, some dummy cycles are required after
> > sending the address bytes, and the dummy cycles need to be counted
> > in s->needed_bytes. This is where the mess began.
> >
> > As the variable name (needed_bytes) indicates, the unit is in byte.
> > It is not in bit, or cycle. However for some reason the model has
> > been using the number of dummy cycles for s->needed_bytes. The right
> > approach is to convert the number of dummy cycles to bytes based on
> > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
>
> While not being the original implementor I must assume that above solution was
> considered but not chosen by the developers due to it is inaccuracy (it
> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> meaning that if the controller is wrongly programmed to generate 7 the error
> wouldn't be caught and the controller will still be considered "correct"). Now
> that we have this detail in the implementation I'm in favor of keeping it, this
> also because the detail is already in use for catching exactly above error.
>

I found no clue from the commit message that my proposed solution here
was ever considered, otherwise all SPI controller models supporting
software generation should have been found out seriously broken long
time ago!

The issue you pointed out that we require the total number of dummy
bits should be multiple of 8 is true, that's why I added the
unimplemented log message in this series (patch 2/3/4) to warn users
if this expectation is not met. However this will not cause any issue
when running U-Boot or Linux, because both spi-nor drivers expect the
same assumption as we do here.

See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
there is a logic to calculate the dummy bytes needed for fast read
command:

    /* convert the dummy cycles to the number of bytes */
    op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;

Note the default dummy cycles configuration for all flashes I have
looked into as of today, meets the multiple of 8 assumption. On some
flashes the dummy cycle number is configurable, and if it's been
configured to be an odd value, it would not work on U-Boot/Linux in
the first place.

> >
> > Things get complicated when interacting with different SPI or QSPI
> > flash controllers. There are major two cases:
> >
> > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> >   For such case, driver will calculate the correct number of dummy
> >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> >   fix flashes working with such controllers.
>
> Above can be fixed while still keeping the detailed dummy cycle implementation
> inside m25p80. Perhaps one of the following could be looked into: configurating
> the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> some functionality handling this in the SPI controller. Or a mixture of above.

Please send patches to explain this in detail how this is going to
work. I am open to all possible solutions.

>
> > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> >   the dummy cycle configuration via some registers, and hardware will
> >   automatically generate dummy cycles for us. Fixing the m25p80 model
> >   is not enough, and we will need to fix the SPI/QSPI models for such
> >   controllers.
> >
> > This series fixes the mess in the m25p80 from the flash side first,
>
> Considering the problems solved by the solution in tree I find m25p80 pretty
> clean, at least I don't see any clearly better way for accurately modeling the
> dummy clock cycles. Counting bits instead of bytes would for example still
> force the controllers to mark which bits to count (when transmitting one dummy
> byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> cycles since it takes two cycles to transfer 8 bits).
>

SPI is a bit based protocol, not bytes. If you insist on bit modeling
with the dummy cycles then you should also suggest we change all
cycles (including command/addr/dummy/data phases) to be modeled with
bits. That way we can accurately emulate everything, for example one
potential problem like transferring 9 bit in the data phase.

However modeling everything with bit is super inefficient. My view is
that we should avoid trying to support uncommon use cases (like not
multiple of 8 for dummy bits) in QEMU.

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-15  2:07   ` Bin Meng
@ 2021-01-15  3:29     ` Havard Skinnemoen via
  2021-01-15 13:54       ` Bin Meng
  2021-01-15 12:26     ` Francisco Iglesias
  1 sibling, 1 reply; 36+ messages in thread
From: Havard Skinnemoen via @ 2021-01-15  3:29 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Francisco Iglesias, Bin Meng,
	Philippe Mathieu-Daudé,
	Max Reitz, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Joel Stanley

Hi Bin,

On Thu, Jan 14, 2021 at 6:08 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>
> Hi Francisco,
>
> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > From: Bin Meng <bin.meng@windriver.com>
> > >
> > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > bytes are expected to be received after it receives a command. For
> > > example, depending on the address mode, either 3-byte address or
> > > 4-byte address is needed.
> > >
> > > For fast read family commands, some dummy cycles are required after
> > > sending the address bytes, and the dummy cycles need to be counted
> > > in s->needed_bytes. This is where the mess began.
> > >
> > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > It is not in bit, or cycle. However for some reason the model has
> > > been using the number of dummy cycles for s->needed_bytes. The right
> > > approach is to convert the number of dummy cycles to bytes based on
> > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> >
> > While not being the original implementor I must assume that above solution was
> > considered but not chosen by the developers due to it is inaccuracy (it
> > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > meaning that if the controller is wrongly programmed to generate 7 the error
> > wouldn't be caught and the controller will still be considered "correct"). Now
> > that we have this detail in the implementation I'm in favor of keeping it, this
> > also because the detail is already in use for catching exactly above error.
> >
>
> I found no clue from the commit message that my proposed solution here
> was ever considered, otherwise all SPI controller models supporting
> software generation should have been found out seriously broken long
> time ago!
>
> The issue you pointed out that we require the total number of dummy
> bits should be multiple of 8 is true, that's why I added the
> unimplemented log message in this series (patch 2/3/4) to warn users
> if this expectation is not met. However this will not cause any issue
> when running U-Boot or Linux, because both spi-nor drivers expect the
> same assumption as we do here.
>
> See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> there is a logic to calculate the dummy bytes needed for fast read
> command:
>
>     /* convert the dummy cycles to the number of bytes */
>     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
>
> Note the default dummy cycles configuration for all flashes I have
> looked into as of today, meets the multiple of 8 assumption. On some
> flashes the dummy cycle number is configurable, and if it's been
> configured to be an odd value, it would not work on U-Boot/Linux in
> the first place.
>
> > >
> > > Things get complicated when interacting with different SPI or QSPI
> > > flash controllers. There are major two cases:
> > >
> > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > >   For such case, driver will calculate the correct number of dummy
> > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > >   fix flashes working with such controllers.
> >
> > Above can be fixed while still keeping the detailed dummy cycle implementation
> > inside m25p80. Perhaps one of the following could be looked into: configurating
> > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > some functionality handling this in the SPI controller. Or a mixture of above.
>
> Please send patches to explain this in detail how this is going to
> work. I am open to all possible solutions.
>
> >
> > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > >   the dummy cycle configuration via some registers, and hardware will
> > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > >   controllers.
> > >
> > > This series fixes the mess in the m25p80 from the flash side first,
> >
> > Considering the problems solved by the solution in tree I find m25p80 pretty
> > clean, at least I don't see any clearly better way for accurately modeling the
> > dummy clock cycles. Counting bits instead of bytes would for example still
> > force the controllers to mark which bits to count (when transmitting one dummy
> > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > cycles since it takes two cycles to transfer 8 bits).
> >
>
> SPI is a bit based protocol, not bytes. If you insist on bit modeling
> with the dummy cycles then you should also suggest we change all
> cycles (including command/addr/dummy/data phases) to be modeled with
> bits. That way we can accurately emulate everything, for example one
> potential problem like transferring 9 bit in the data phase.

I agree with this. There's really nothing special about dummy cycles.
Making them special makes it super painful to implement SPI controller
emulation because you have to anticipate when ssi_transfer changes
semantics from byte-at-a-time to bit-at-a-time. I doubt all the SPI
controllers in the tree gets it right all the time.

> However modeling everything with bit is super inefficient. My view is
> that we should avoid trying to support uncommon use cases (like not
> multiple of 8 for dummy bits) in QEMU.

Perhaps ssi_transfer could take an additional bits parameter? That
should make it possible to transfer any number of bits up to 32, while
keeping the common case simple on both sides. And it would work for
any SPI transfer, not just dummy cycles.

Havard


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-15  2:07   ` Bin Meng
  2021-01-15  3:29     ` Havard Skinnemoen via
@ 2021-01-15 12:26     ` Francisco Iglesias
  2021-01-15 14:38       ` Bin Meng
  1 sibling, 1 reply; 36+ messages in thread
From: Francisco Iglesias @ 2021-01-15 12:26 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Bin,

On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> Hi Francisco,
> 
> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > From: Bin Meng <bin.meng@windriver.com>
> > >
> > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > bytes are expected to be received after it receives a command. For
> > > example, depending on the address mode, either 3-byte address or
> > > 4-byte address is needed.
> > >
> > > For fast read family commands, some dummy cycles are required after
> > > sending the address bytes, and the dummy cycles need to be counted
> > > in s->needed_bytes. This is where the mess began.
> > >
> > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > It is not in bit, or cycle. However for some reason the model has
> > > been using the number of dummy cycles for s->needed_bytes. The right
> > > approach is to convert the number of dummy cycles to bytes based on
> > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> >
> > While not being the original implementor I must assume that above solution was
> > considered but not chosen by the developers due to it is inaccuracy (it
> > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > meaning that if the controller is wrongly programmed to generate 7 the error
> > wouldn't be caught and the controller will still be considered "correct"). Now
> > that we have this detail in the implementation I'm in favor of keeping it, this
> > also because the detail is already in use for catching exactly above error.
> >
> 
> I found no clue from the commit message that my proposed solution here
> was ever considered, otherwise all SPI controller models supporting
> software generation should have been found out seriously broken long
> time ago!


The controllers you are referring to might lack support for commands requiring
dummy clock cycles but I really hope they work with the other commands? If so I
don't think it is fair to call them 'seriously broken' (and else we should
probably let the maintainers know about it). Most likely the lack of support
for the commands is because no request has been made for them. Also there is
one controller that has support.


> 
> The issue you pointed out that we require the total number of dummy
> bits should be multiple of 8 is true, that's why I added the
> unimplemented log message in this series (patch 2/3/4) to warn users
> if this expectation is not met. However this will not cause any issue
> when running U-Boot or Linux, because both spi-nor drivers expect the
> same assumption as we do here.
> 
> See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> there is a logic to calculate the dummy bytes needed for fast read
> command:
> 
>     /* convert the dummy cycles to the number of bytes */
>     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> 
> Note the default dummy cycles configuration for all flashes I have
> looked into as of today, meets the multiple of 8 assumption. On some
> flashes the dummy cycle number is configurable, and if it's been
> configured to be an odd value, it would not work on U-Boot/Linux in
> the first place.
> 
> > >
> > > Things get complicated when interacting with different SPI or QSPI
> > > flash controllers. There are major two cases:
> > >
> > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > >   For such case, driver will calculate the correct number of dummy
> > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > >   fix flashes working with such controllers.
> >
> > Above can be fixed while still keeping the detailed dummy cycle implementation
> > inside m25p80. Perhaps one of the following could be looked into: configurating
> > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > some functionality handling this in the SPI controller. Or a mixture of above.
> 
> Please send patches to explain this in detail how this is going to
> work. I am open to all possible solutions.

In that case I suggest that you instead try with a device property
'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
count to dummy bytes inside m25p80. Below is an example on how to modify the
decode_fast_read_cmd function (the other commands requiring dummy clock cycles
can follow a similar pattern). This way the fifo mode will be able to work the
way you desire while also keeping the current functionality intact. Suddenly
removing functionality (features) will take users by surprise. 


static void decode_fast_read_cmd(Flash *s)
{
    uint8_t dummy_clk_cycles = 0;
    uint8_t extra_bytes;

    s->needed_bytes = get_addr_length(s);

    /* Obtain the number of dummy clock cycles needed */
    switch (get_man(s)) {
    case MAN_WINBOND:
        dummy_clk_cycles += 8;
        break;
    case MAN_NUMONYX:
        dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s);
        break;
    case MAN_MACRONIX:
        if (extract32(s->volatile_cfg, 6, 2) == 1) {
            dummy_clk_cycles += 6;
        } else {
            dummy_clk_cycles += 8;
        }
        break;
    case MAN_SPANSION:
        dummy_clk_cycles += extract32(s->spansion_cr2v,
                                    SPANSION_DUMMY_CLK_POS,
                                    SPANSION_DUMMY_CLK_LEN
                                    );
        break;
    default:
        break;
    }

    if (s->model_dummy_bytes) {
        int lines = 1;

        /*
         * Expect dummy bytes from the controller so convert the dummy
         * clock cycles to dummy_bytes.
         */
        extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines);
    } else {
        /* Model individual dummy clock cycles as byte writes */
        extra_bytes = dummy_clk_cycles;
    }

    s->needed_bytes += extra_bytes;
    s->pos = 0;
    s->len = 0;
    s->state = STATE_COLLECTING_DATA;
}

Best regards,
Francisco Iglesias

> 
> >
> > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > >   the dummy cycle configuration via some registers, and hardware will
> > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > >   controllers.
> > >
> > > This series fixes the mess in the m25p80 from the flash side first,
> >
> > Considering the problems solved by the solution in tree I find m25p80 pretty
> > clean, at least I don't see any clearly better way for accurately modeling the
> > dummy clock cycles. Counting bits instead of bytes would for example still
> > force the controllers to mark which bits to count (when transmitting one dummy
> > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > cycles since it takes two cycles to transfer 8 bits).
> >
> 
> SPI is a bit based protocol, not bytes. If you insist on bit modeling
> with the dummy cycles then you should also suggest we change all
> cycles (including command/addr/dummy/data phases) to be modeled with
> bits. That way we can accurately emulate everything, for example one
> potential problem like transferring 9 bit in the data phase.
> 
> However modeling everything with bit is super inefficient. My view is
> that we should avoid trying to support uncommon use cases (like not
> multiple of 8 for dummy bits) in QEMU.
> 
> Regards,
> Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-15  3:29     ` Havard Skinnemoen via
@ 2021-01-15 13:54       ` Bin Meng
  0 siblings, 0 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-15 13:54 UTC (permalink / raw)
  To: Havard Skinnemoen
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Francisco Iglesias, Bin Meng,
	Philippe Mathieu-Daudé,
	Max Reitz, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Joel Stanley

Hi Havard,

On Fri, Jan 15, 2021 at 11:29 AM Havard Skinnemoen
<hskinnemoen@google.com> wrote:
>
> Hi Bin,
>
> On Thu, Jan 14, 2021 at 6:08 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >
> > Hi Francisco,
> >
> > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > From: Bin Meng <bin.meng@windriver.com>
> > > >
> > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > bytes are expected to be received after it receives a command. For
> > > > example, depending on the address mode, either 3-byte address or
> > > > 4-byte address is needed.
> > > >
> > > > For fast read family commands, some dummy cycles are required after
> > > > sending the address bytes, and the dummy cycles need to be counted
> > > > in s->needed_bytes. This is where the mess began.
> > > >
> > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > It is not in bit, or cycle. However for some reason the model has
> > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > approach is to convert the number of dummy cycles to bytes based on
> > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > >
> > > While not being the original implementor I must assume that above solution was
> > > considered but not chosen by the developers due to it is inaccuracy (it
> > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > also because the detail is already in use for catching exactly above error.
> > >
> >
> > I found no clue from the commit message that my proposed solution here
> > was ever considered, otherwise all SPI controller models supporting
> > software generation should have been found out seriously broken long
> > time ago!
> >
> > The issue you pointed out that we require the total number of dummy
> > bits should be multiple of 8 is true, that's why I added the
> > unimplemented log message in this series (patch 2/3/4) to warn users
> > if this expectation is not met. However this will not cause any issue
> > when running U-Boot or Linux, because both spi-nor drivers expect the
> > same assumption as we do here.
> >
> > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > there is a logic to calculate the dummy bytes needed for fast read
> > command:
> >
> >     /* convert the dummy cycles to the number of bytes */
> >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> >
> > Note the default dummy cycles configuration for all flashes I have
> > looked into as of today, meets the multiple of 8 assumption. On some
> > flashes the dummy cycle number is configurable, and if it's been
> > configured to be an odd value, it would not work on U-Boot/Linux in
> > the first place.
> >
> > > >
> > > > Things get complicated when interacting with different SPI or QSPI
> > > > flash controllers. There are major two cases:
> > > >
> > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > >   For such case, driver will calculate the correct number of dummy
> > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > >   fix flashes working with such controllers.
> > >
> > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > some functionality handling this in the SPI controller. Or a mixture of above.
> >
> > Please send patches to explain this in detail how this is going to
> > work. I am open to all possible solutions.
> >
> > >
> > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > > >   the dummy cycle configuration via some registers, and hardware will
> > > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > > >   controllers.
> > > >
> > > > This series fixes the mess in the m25p80 from the flash side first,
> > >
> > > Considering the problems solved by the solution in tree I find m25p80 pretty
> > > clean, at least I don't see any clearly better way for accurately modeling the
> > > dummy clock cycles. Counting bits instead of bytes would for example still
> > > force the controllers to mark which bits to count (when transmitting one dummy
> > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > > cycles since it takes two cycles to transfer 8 bits).
> > >
> >
> > SPI is a bit based protocol, not bytes. If you insist on bit modeling
> > with the dummy cycles then you should also suggest we change all
> > cycles (including command/addr/dummy/data phases) to be modeled with
> > bits. That way we can accurately emulate everything, for example one
> > potential problem like transferring 9 bit in the data phase.
>
> I agree with this. There's really nothing special about dummy cycles.
> Making them special makes it super painful to implement SPI controller
> emulation because you have to anticipate when ssi_transfer changes
> semantics from byte-at-a-time to bit-at-a-time. I doubt all the SPI
> controllers in the tree gets it right all the time.
>

Yep, it's not just painful for SPI controllers, and for the case 1 SPI
controller it's impossible to snoop the data to distinguish when the
dummy cycles begin.

> > However modeling everything with bit is super inefficient. My view is
> > that we should avoid trying to support uncommon use cases (like not
> > multiple of 8 for dummy bits) in QEMU.
>
> Perhaps ssi_transfer could take an additional bits parameter? That
> should make it possible to transfer any number of bits up to 32, while
> keeping the common case simple on both sides. And it would work for
> any SPI transfer, not just dummy cycles.

This sounds like a good tradeoff from the emulator perspective. But I
am not sure we should do this to solve the dummy cycle mess given all
the default dummy cycle configurations so far match the multiple of 8
assumption.

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-15 12:26     ` Francisco Iglesias
@ 2021-01-15 14:38       ` Bin Meng
  2021-01-18 10:05         ` Francisco Iglesias
  0 siblings, 1 reply; 36+ messages in thread
From: Bin Meng @ 2021-01-15 14:38 UTC (permalink / raw)
  To: Francisco Iglesias
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Francisco,

On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > From: Bin Meng <bin.meng@windriver.com>
> > > >
> > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > bytes are expected to be received after it receives a command. For
> > > > example, depending on the address mode, either 3-byte address or
> > > > 4-byte address is needed.
> > > >
> > > > For fast read family commands, some dummy cycles are required after
> > > > sending the address bytes, and the dummy cycles need to be counted
> > > > in s->needed_bytes. This is where the mess began.
> > > >
> > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > It is not in bit, or cycle. However for some reason the model has
> > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > approach is to convert the number of dummy cycles to bytes based on
> > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > >
> > > While not being the original implementor I must assume that above solution was
> > > considered but not chosen by the developers due to it is inaccuracy (it
> > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > also because the detail is already in use for catching exactly above error.
> > >
> >
> > I found no clue from the commit message that my proposed solution here
> > was ever considered, otherwise all SPI controller models supporting
> > software generation should have been found out seriously broken long
> > time ago!
>
>
> The controllers you are referring to might lack support for commands requiring
> dummy clock cycles but I really hope they work with the other commands? If so I

I am not sure why you view dummy clock cycles as something special
that needs some special support from the SPI controller. For the case
1 controller, it's nothing special from the controller perspective,
just like sending out a command, or address bytes, or data. The
controller just shifts data bit by bit from its tx fifo and that's it.
In the Xilinx GQSPI controller case, the dummy cycles can either be
sent via a regular data (the case 1 controller) in the tx fifo, or
automatically generated (case 2 controller) by the hardware.

> don't think it is fair to call them 'seriously broken' (and else we should
> probably let the maintainers know about it). Most likely the lack of support

I called it "seriously broken" because current implementation only
considered one type of SPI controllers while completely ignoring the
other type.

> for the commands is because no request has been made for them. Also there is
> one controller that has support.

Definitely it's not "no request". Nearly all SPI flashes support the
Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
"seriously broken" for those case 1 type controllers because they
cannot read anything from the m25p80 model at all. Unless the guest
software being tested only uses Read (03h) command which is not
affected. But I can't find a software that uses Read instead of Fast
Read.

> > The issue you pointed out that we require the total number of dummy
> > bits should be multiple of 8 is true, that's why I added the
> > unimplemented log message in this series (patch 2/3/4) to warn users
> > if this expectation is not met. However this will not cause any issue
> > when running U-Boot or Linux, because both spi-nor drivers expect the
> > same assumption as we do here.
> >
> > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > there is a logic to calculate the dummy bytes needed for fast read
> > command:
> >
> >     /* convert the dummy cycles to the number of bytes */
> >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> >
> > Note the default dummy cycles configuration for all flashes I have
> > looked into as of today, meets the multiple of 8 assumption. On some
> > flashes the dummy cycle number is configurable, and if it's been
> > configured to be an odd value, it would not work on U-Boot/Linux in
> > the first place.
> >
> > > >
> > > > Things get complicated when interacting with different SPI or QSPI
> > > > flash controllers. There are major two cases:
> > > >
> > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > >   For such case, driver will calculate the correct number of dummy
> > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > >   fix flashes working with such controllers.
> > >
> > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > some functionality handling this in the SPI controller. Or a mixture of above.
> >
> > Please send patches to explain this in detail how this is going to
> > work. I am open to all possible solutions.
>
> In that case I suggest that you instead try with a device property
> 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> count to dummy bytes inside m25p80. Below is an example on how to modify the

No this is wrong in my view. This is not like a DMA vs. PIO handling.

> decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> can follow a similar pattern). This way the fifo mode will be able to work the
> way you desire while also keeping the current functionality intact. Suddenly
> removing functionality (features) will take users by surprise.

I don't think we are removing any features. This is a fix to make the
model to be used by any SPI controllers.

As I pointed out, both U-Boot and Linux have the multiple of 8
assumption for the dummy bit, which is the default configuration for
all flashes I have looked into so far. Can you please comment what use
case you want to support? I requested a U-Boot/Linux kernel testing in
the previous SST thread [1] against Xilinx GQSPI but there was no
response.

[1] http://patchwork.ozlabs.org/project/qemu-devel/patch/1606704602-59435-1-git-send-email-bmeng.cn@gmail.com/

>
> static void decode_fast_read_cmd(Flash *s)
> {
>     uint8_t dummy_clk_cycles = 0;
>     uint8_t extra_bytes;
>
>     s->needed_bytes = get_addr_length(s);
>
>     /* Obtain the number of dummy clock cycles needed */
>     switch (get_man(s)) {
>     case MAN_WINBOND:
>         dummy_clk_cycles += 8;
>         break;
>     case MAN_NUMONYX:
>         dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s);
>         break;
>     case MAN_MACRONIX:
>         if (extract32(s->volatile_cfg, 6, 2) == 1) {
>             dummy_clk_cycles += 6;
>         } else {
>             dummy_clk_cycles += 8;
>         }
>         break;
>     case MAN_SPANSION:
>         dummy_clk_cycles += extract32(s->spansion_cr2v,
>                                     SPANSION_DUMMY_CLK_POS,
>                                     SPANSION_DUMMY_CLK_LEN
>                                     );
>         break;
>     default:
>         break;
>     }
>
>     if (s->model_dummy_bytes) {
>         int lines = 1;
>
>         /*
>          * Expect dummy bytes from the controller so convert the dummy
>          * clock cycles to dummy_bytes.
>          */
>         extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines);
>     } else {
>         /* Model individual dummy clock cycles as byte writes */
>         extra_bytes = dummy_clk_cycles;
>     }
>
>     s->needed_bytes += extra_bytes;
>     s->pos = 0;
>     s->len = 0;
>     s->state = STATE_COLLECTING_DATA;
> }
>
> Best regards,
> Francisco Iglesias
>
> >
> > >
> > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > > >   the dummy cycle configuration via some registers, and hardware will
> > > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > > >   controllers.
> > > >
> > > > This series fixes the mess in the m25p80 from the flash side first,
> > >
> > > Considering the problems solved by the solution in tree I find m25p80 pretty
> > > clean, at least I don't see any clearly better way for accurately modeling the
> > > dummy clock cycles. Counting bits instead of bytes would for example still
> > > force the controllers to mark which bits to count (when transmitting one dummy
> > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > > cycles since it takes two cycles to transfer 8 bits).
> > >
> >
> > SPI is a bit based protocol, not bytes. If you insist on bit modeling
> > with the dummy cycles then you should also suggest we change all
> > cycles (including command/addr/dummy/data phases) to be modeled with
> > bits. That way we can accurately emulate everything, for example one
> > potential problem like transferring 9 bit in the data phase.
> >
> > However modeling everything with bit is super inefficient. My view is
> > that we should avoid trying to support uncommon use cases (like not
> > multiple of 8 for dummy bits) in QEMU.

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-15 14:38       ` Bin Meng
@ 2021-01-18 10:05         ` Francisco Iglesias
  2021-01-18 12:32           ` Bin Meng
  0 siblings, 1 reply; 36+ messages in thread
From: Francisco Iglesias @ 2021-01-18 10:05 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Bin,

On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> Hi Francisco,
> 
> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > >
> > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > bytes are expected to be received after it receives a command. For
> > > > > example, depending on the address mode, either 3-byte address or
> > > > > 4-byte address is needed.
> > > > >
> > > > > For fast read family commands, some dummy cycles are required after
> > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > in s->needed_bytes. This is where the mess began.
> > > > >
> > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > >
> > > > While not being the original implementor I must assume that above solution was
> > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > also because the detail is already in use for catching exactly above error.
> > > >
> > >
> > > I found no clue from the commit message that my proposed solution here
> > > was ever considered, otherwise all SPI controller models supporting
> > > software generation should have been found out seriously broken long
> > > time ago!
> >
> >
> > The controllers you are referring to might lack support for commands requiring
> > dummy clock cycles but I really hope they work with the other commands? If so I
> 
> I am not sure why you view dummy clock cycles as something special
> that needs some special support from the SPI controller. For the case
> 1 controller, it's nothing special from the controller perspective,
> just like sending out a command, or address bytes, or data. The
> controller just shifts data bit by bit from its tx fifo and that's it.
> In the Xilinx GQSPI controller case, the dummy cycles can either be
> sent via a regular data (the case 1 controller) in the tx fifo, or
> automatically generated (case 2 controller) by the hardware.

Ok, I'll try to explain my view point a little differently. For that we also
need to keep in mind that QEMU models HW, and any binary that runs on a HW
board supported in QEMU should ideally run on that board inside QEMU aswell
(this can be a bare metal application equaly well as a modified u-boot/Linux
using SPI commands with a non multiple of 8 number of dummy clock cycles).

Once functionality has been introduced into QEMU it is not easy to know which
intentional or untentional features provided by the functionality are being
used by users. One of the (perhaps not well known) features I'm aware of that
is in use and is provided by the accurate dummy clock cycle modeling inside
m25p80 is the be ability to test drivers accurately regarding the dummy clock
cycles (even when using commands with a non-multiple of 8 number of dummy clock
cycles), but there might be others aswell. So by removing this functionality
above use case will brake, this since those test will not be reliable.
Furthermore, since users tend to be creative it is not possible to know if
there are other use cases that will be affected. This means that in case [1]
needs to be followed the safe path is to add functionality instead of removing.
Luckily it also easier in this case, see below.


> 
> > don't think it is fair to call them 'seriously broken' (and else we should
> > probably let the maintainers know about it). Most likely the lack of support
> 
> I called it "seriously broken" because current implementation only
> considered one type of SPI controllers while completely ignoring the
> other type.

If we change view and see this from the perspective of m25p80, it models the
commands a certain way and provides an API that the SPI controllers need to
implement for interacting with it. It is true that there are SPI controllers
referred to above that do not support the portion of that API that corresponds
to commands with dummy clock cycles, but I don't think it is true that this is
broken since there is also one SPI controller that has a working implementation
of m25p80's full API also when transfering through a tx fifo (use case 1). But
as mentioned above, by doing a minor extension and improvement to m25p80's API
and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
will still be honored as in the same time making it possible to have full
support for the API in the SPI controllers that currently do not (please reread
the proposal in my previous reply that attempts to do this). I myself see this
as win/win situation, also because no controller should need modifications.


> 
> > for the commands is because no request has been made for them. Also there is
> > one controller that has support.
> 
> Definitely it's not "no request". Nearly all SPI flashes support the
> Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> "seriously broken" for those case 1 type controllers because they
> cannot read anything from the m25p80 model at all. Unless the guest
> software being tested only uses Read (03h) command which is not
> affected. But I can't find a software that uses Read instead of Fast
> Read.
> 
> > > The issue you pointed out that we require the total number of dummy
> > > bits should be multiple of 8 is true, that's why I added the
> > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > if this expectation is not met. However this will not cause any issue
> > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > same assumption as we do here.
> > >
> > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > there is a logic to calculate the dummy bytes needed for fast read
> > > command:
> > >
> > >     /* convert the dummy cycles to the number of bytes */
> > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > >
> > > Note the default dummy cycles configuration for all flashes I have
> > > looked into as of today, meets the multiple of 8 assumption. On some
> > > flashes the dummy cycle number is configurable, and if it's been
> > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > the first place.
> > >
> > > > >
> > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > flash controllers. There are major two cases:
> > > > >
> > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > >   For such case, driver will calculate the correct number of dummy
> > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > >   fix flashes working with such controllers.
> > > >
> > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > >
> > > Please send patches to explain this in detail how this is going to
> > > work. I am open to all possible solutions.
> >
> > In that case I suggest that you instead try with a device property
> > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > count to dummy bytes inside m25p80. Below is an example on how to modify the
> 
> No this is wrong in my view. This is not like a DMA vs. PIO handling.
> 
> > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > can follow a similar pattern). This way the fifo mode will be able to work the
> > way you desire while also keeping the current functionality intact. Suddenly
> > removing functionality (features) will take users by surprise.
> 
> I don't think we are removing any features. This is a fix to make the
> model to be used by any SPI controllers.
> 
> As I pointed out, both U-Boot and Linux have the multiple of 8
> assumption for the dummy bit, which is the default configuration for
> all flashes I have looked into so far. Can you please comment what use
> case you want to support? I requested a U-Boot/Linux kernel testing in
> the previous SST thread [1] against Xilinx GQSPI but there was no
> response.

In [2] instructions on how to boot u-boot/Linux is found. For building the
various software components I followed the official doc in [3].

Best regards,
Francisco

[1] qemu/docs/system/deprecated.rst
[2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
[3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux


> 
> [1] http://patchwork.ozlabs.org/project/qemu-devel/patch/1606704602-59435-1-git-send-email-bmeng.cn@gmail.com/
> 
> >
> > static void decode_fast_read_cmd(Flash *s)
> > {
> >     uint8_t dummy_clk_cycles = 0;
> >     uint8_t extra_bytes;
> >
> >     s->needed_bytes = get_addr_length(s);
> >
> >     /* Obtain the number of dummy clock cycles needed */
> >     switch (get_man(s)) {
> >     case MAN_WINBOND:
> >         dummy_clk_cycles += 8;
> >         break;
> >     case MAN_NUMONYX:
> >         dummy_clk_cycles += numonyx_extract_cfg_num_dummies(s);
> >         break;
> >     case MAN_MACRONIX:
> >         if (extract32(s->volatile_cfg, 6, 2) == 1) {
> >             dummy_clk_cycles += 6;
> >         } else {
> >             dummy_clk_cycles += 8;
> >         }
> >         break;
> >     case MAN_SPANSION:
> >         dummy_clk_cycles += extract32(s->spansion_cr2v,
> >                                     SPANSION_DUMMY_CLK_POS,
> >                                     SPANSION_DUMMY_CLK_LEN
> >                                     );
> >         break;
> >     default:
> >         break;
> >     }
> >
> >     if (s->model_dummy_bytes) {
> >         int lines = 1;
> >
> >         /*
> >          * Expect dummy bytes from the controller so convert the dummy
> >          * clock cycles to dummy_bytes.
> >          */
> >         extra_bytes = convert_to_dummy_bytes(dummy_clk_count, lines);
> >     } else {
> >         /* Model individual dummy clock cycles as byte writes */
> >         extra_bytes = dummy_clk_cycles;
> >     }
> >
> >     s->needed_bytes += extra_bytes;
> >     s->pos = 0;
> >     s->len = 0;
> >     s->state = STATE_COLLECTING_DATA;
> > }
> >
> > Best regards,
> > Francisco Iglesias
> >
> > >
> > > >
> > > > > - Dummy bytes not prepared by drivers. Drivers just tell the hardware
> > > > >   the dummy cycle configuration via some registers, and hardware will
> > > > >   automatically generate dummy cycles for us. Fixing the m25p80 model
> > > > >   is not enough, and we will need to fix the SPI/QSPI models for such
> > > > >   controllers.
> > > > >
> > > > > This series fixes the mess in the m25p80 from the flash side first,
> > > >
> > > > Considering the problems solved by the solution in tree I find m25p80 pretty
> > > > clean, at least I don't see any clearly better way for accurately modeling the
> > > > dummy clock cycles. Counting bits instead of bytes would for example still
> > > > force the controllers to mark which bits to count (when transmitting one dummy
> > > > byte from a txfifo on four lines (Quad command) it generates 2 dummy clock
> > > > cycles since it takes two cycles to transfer 8 bits).
> > > >
> > >
> > > SPI is a bit based protocol, not bytes. If you insist on bit modeling
> > > with the dummy cycles then you should also suggest we change all
> > > cycles (including command/addr/dummy/data phases) to be modeled with
> > > bits. That way we can accurately emulate everything, for example one
> > > potential problem like transferring 9 bit in the data phase.
> > >
> > > However modeling everything with bit is super inefficient. My view is
> > > that we should avoid trying to support uncommon use cases (like not
> > > multiple of 8 for dummy bits) in QEMU.
> 
> Regards,
> Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-18 10:05         ` Francisco Iglesias
@ 2021-01-18 12:32           ` Bin Meng
  2021-01-19 13:01             ` Francisco Iglesias
  0 siblings, 1 reply; 36+ messages in thread
From: Bin Meng @ 2021-01-18 12:32 UTC (permalink / raw)
  To: Francisco Iglesias
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Francisco,

On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > >
> > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > bytes are expected to be received after it receives a command. For
> > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > 4-byte address is needed.
> > > > > >
> > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > in s->needed_bytes. This is where the mess began.
> > > > > >
> > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > >
> > > > > While not being the original implementor I must assume that above solution was
> > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > also because the detail is already in use for catching exactly above error.
> > > > >
> > > >
> > > > I found no clue from the commit message that my proposed solution here
> > > > was ever considered, otherwise all SPI controller models supporting
> > > > software generation should have been found out seriously broken long
> > > > time ago!
> > >
> > >
> > > The controllers you are referring to might lack support for commands requiring
> > > dummy clock cycles but I really hope they work with the other commands? If so I
> >
> > I am not sure why you view dummy clock cycles as something special
> > that needs some special support from the SPI controller. For the case
> > 1 controller, it's nothing special from the controller perspective,
> > just like sending out a command, or address bytes, or data. The
> > controller just shifts data bit by bit from its tx fifo and that's it.
> > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > sent via a regular data (the case 1 controller) in the tx fifo, or
> > automatically generated (case 2 controller) by the hardware.
>
> Ok, I'll try to explain my view point a little differently. For that we also
> need to keep in mind that QEMU models HW, and any binary that runs on a HW
> board supported in QEMU should ideally run on that board inside QEMU aswell
> (this can be a bare metal application equaly well as a modified u-boot/Linux
> using SPI commands with a non multiple of 8 number of dummy clock cycles).
>
> Once functionality has been introduced into QEMU it is not easy to know which
> intentional or untentional features provided by the functionality are being
> used by users. One of the (perhaps not well known) features I'm aware of that
> is in use and is provided by the accurate dummy clock cycle modeling inside
> m25p80 is the be ability to test drivers accurately regarding the dummy clock
> cycles (even when using commands with a non-multiple of 8 number of dummy clock
> cycles), but there might be others aswell. So by removing this functionality
> above use case will brake, this since those test will not be reliable.
> Furthermore, since users tend to be creative it is not possible to know if
> there are other use cases that will be affected. This means that in case [1]
> needs to be followed the safe path is to add functionality instead of removing.
> Luckily it also easier in this case, see below.

I understand there might be users other than U-Boot/Linux that use an
odd number of dummy bits (not multiple of 8). If your concern was
about model behavior changes, sure I can update
qemu/docs/system/deprecated.rst to mention that some flashes in the
m25p80 model now implement dummy cycles as bytes.

> >
> > > don't think it is fair to call them 'seriously broken' (and else we should
> > > probably let the maintainers know about it). Most likely the lack of support
> >
> > I called it "seriously broken" because current implementation only
> > considered one type of SPI controllers while completely ignoring the
> > other type.
>
> If we change view and see this from the perspective of m25p80, it models the
> commands a certain way and provides an API that the SPI controllers need to
> implement for interacting with it. It is true that there are SPI controllers
> referred to above that do not support the portion of that API that corresponds
> to commands with dummy clock cycles, but I don't think it is true that this is
> broken since there is also one SPI controller that has a working implementation
> of m25p80's full API also when transfering through a tx fifo (use case 1). But
> as mentioned above, by doing a minor extension and improvement to m25p80's API
> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> will still be honored as in the same time making it possible to have full
> support for the API in the SPI controllers that currently do not (please reread
> the proposal in my previous reply that attempts to do this). I myself see this
> as win/win situation, also because no controller should need modifications.
>

I am afraid your proposal does not work. Your proposed new device
property 'model_dummy_bytes' to select to convert the accurate dummy
clock cycle count to dummy bytes inside m25p80, is hard to justify as
a property to the flash itself, as the behavior is tightly coupled to
how the SPI controller works.

Please take a look at the Xilinx GQSPI controller, which supports both
use cases, that the dummy cycles can be transferred via tx fifo, or
generated by the controller automatically. Please read the example
given in:

    table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
Command (EBh)

in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf

If you choose to set the m25p80 device property 'model_dummy_bytes' to
true when working with the Xilinx GQSPI controller, you are bound to
only allow guest software to use tx fifo to transfer the dummy cycles,
and this is wrong.

>
> >
> > > for the commands is because no request has been made for them. Also there is
> > > one controller that has support.
> >
> > Definitely it's not "no request". Nearly all SPI flashes support the
> > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > "seriously broken" for those case 1 type controllers because they
> > cannot read anything from the m25p80 model at all. Unless the guest
> > software being tested only uses Read (03h) command which is not
> > affected. But I can't find a software that uses Read instead of Fast
> > Read.
> >
> > > > The issue you pointed out that we require the total number of dummy
> > > > bits should be multiple of 8 is true, that's why I added the
> > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > if this expectation is not met. However this will not cause any issue
> > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > same assumption as we do here.
> > > >
> > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > command:
> > > >
> > > >     /* convert the dummy cycles to the number of bytes */
> > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > >
> > > > Note the default dummy cycles configuration for all flashes I have
> > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > flashes the dummy cycle number is configurable, and if it's been
> > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > the first place.
> > > >
> > > > > >
> > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > flash controllers. There are major two cases:
> > > > > >
> > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > >   fix flashes working with such controllers.
> > > > >
> > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > >
> > > > Please send patches to explain this in detail how this is going to
> > > > work. I am open to all possible solutions.
> > >
> > > In that case I suggest that you instead try with a device property
> > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> >
> > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> >
> > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > way you desire while also keeping the current functionality intact. Suddenly
> > > removing functionality (features) will take users by surprise.
> >
> > I don't think we are removing any features. This is a fix to make the
> > model to be used by any SPI controllers.
> >
> > As I pointed out, both U-Boot and Linux have the multiple of 8
> > assumption for the dummy bit, which is the default configuration for
> > all flashes I have looked into so far. Can you please comment what use
> > case you want to support? I requested a U-Boot/Linux kernel testing in
> > the previous SST thread [1] against Xilinx GQSPI but there was no
> > response.
>
> In [2] instructions on how to boot u-boot/Linux is found. For building the
> various software components I followed the official doc in [3].

I see the following QEMU commands are used to test booting U-Boot/Linux:

$ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
-serial stdio -display none -device loader,file=u-boot.elf -kernel
bl31.elf -device loader,addr=0x40000000,file=Image -device
loader,addr=0x2000000,file=system.dtb

I am not sure where the system.dtb gets built from?

In [3], it mentions the Xilinx QEMU is used. And a different QEMU
command is used as the example to launch U-Boot which is different
from your command above.

See https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841606/QEMU+-+Zynq+UltraScale+MPSoC#QEMU-ZynqUltraScale+MPSoC-RunningaZynqUltraScale+U-bootImageOnXilinx'sARMQEMU

$ ./aarch64-softmmu/qemu-system-aarch64 -M arm-generic-fdt -serial
mon:stdio -serial /dev/null -display none \
  -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 \ # Un-reset the A53
  -device loader,file=./pre-built/linux/images/bl31.elf,cpu-num=0 \ #
ARM Trusted Firmware
  -device loader,file=./pre-built/linux/images/u-boot.elf\ # The
u-boot exectuable
  -hw-dtb ./pre-built/linux/images/zynqmp-qemu-arm.dtb # HW Device
Tree that QEMU uses to generate the model

It is using a machine called "arm-generic-fdt", but in the mainline
QEMU there is no such machine called "arm-generic-fdt".

>
> Best regards,
> Francisco
>
> [1] qemu/docs/system/deprecated.rst
> [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> [3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux
>

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-18 12:32           ` Bin Meng
@ 2021-01-19 13:01             ` Francisco Iglesias
  2021-01-20 14:20               ` Bin Meng
  0 siblings, 1 reply; 36+ messages in thread
From: Francisco Iglesias @ 2021-01-19 13:01 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Bin,

On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> Hi Francisco,
> 
> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > >
> > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > 4-byte address is needed.
> > > > > > >
> > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > >
> > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > >
> > > > > > While not being the original implementor I must assume that above solution was
> > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > also because the detail is already in use for catching exactly above error.
> > > > > >
> > > > >
> > > > > I found no clue from the commit message that my proposed solution here
> > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > software generation should have been found out seriously broken long
> > > > > time ago!
> > > >
> > > >
> > > > The controllers you are referring to might lack support for commands requiring
> > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > >
> > > I am not sure why you view dummy clock cycles as something special
> > > that needs some special support from the SPI controller. For the case
> > > 1 controller, it's nothing special from the controller perspective,
> > > just like sending out a command, or address bytes, or data. The
> > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > automatically generated (case 2 controller) by the hardware.
> >
> > Ok, I'll try to explain my view point a little differently. For that we also
> > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > board supported in QEMU should ideally run on that board inside QEMU aswell
> > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> >
> > Once functionality has been introduced into QEMU it is not easy to know which
> > intentional or untentional features provided by the functionality are being
> > used by users. One of the (perhaps not well known) features I'm aware of that
> > is in use and is provided by the accurate dummy clock cycle modeling inside
> > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > cycles), but there might be others aswell. So by removing this functionality
> > above use case will brake, this since those test will not be reliable.
> > Furthermore, since users tend to be creative it is not possible to know if
> > there are other use cases that will be affected. This means that in case [1]
> > needs to be followed the safe path is to add functionality instead of removing.
> > Luckily it also easier in this case, see below.
> 
> I understand there might be users other than U-Boot/Linux that use an
> odd number of dummy bits (not multiple of 8). If your concern was
> about model behavior changes, sure I can update
> qemu/docs/system/deprecated.rst to mention that some flashes in the
> m25p80 model now implement dummy cycles as bytes.

Yes, something like that. My concern is that since this functionality has been
in tree for while, users have found known or unknown features that got
introduced by it. By removing the functionality (and the known/uknown features)
we are riscing to brake our user's use cases (currently I'm aware of one
feature/use case but it is not unlikely that there are more). [1] states that
"In general features are intended to be supported indefinitely once introduced
into QEMU", to me that makes very much sense because the opposite would mean
that we were not reliable. So in case [1] needs to be honored it looks to be
safer to add functionality instead of removing (and riscing the removal of use
cases/features). Luckily I still believe in this case that it will be easier to
go forward (even if I also agree on what you are saying below about what I
proposed).


> 
> > >
> > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > probably let the maintainers know about it). Most likely the lack of support
> > >
> > > I called it "seriously broken" because current implementation only
> > > considered one type of SPI controllers while completely ignoring the
> > > other type.
> >
> > If we change view and see this from the perspective of m25p80, it models the
> > commands a certain way and provides an API that the SPI controllers need to
> > implement for interacting with it. It is true that there are SPI controllers
> > referred to above that do not support the portion of that API that corresponds
> > to commands with dummy clock cycles, but I don't think it is true that this is
> > broken since there is also one SPI controller that has a working implementation
> > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > will still be honored as in the same time making it possible to have full
> > support for the API in the SPI controllers that currently do not (please reread
> > the proposal in my previous reply that attempts to do this). I myself see this
> > as win/win situation, also because no controller should need modifications.
> >
> 
> I am afraid your proposal does not work. Your proposed new device
> property 'model_dummy_bytes' to select to convert the accurate dummy
> clock cycle count to dummy bytes inside m25p80, is hard to justify as
> a property to the flash itself, as the behavior is tightly coupled to
> how the SPI controller works.

I agree on above. I decided though that instead of posting sample code in here
I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
Xilinx ZynqMP GQSPI should not need any modication in a first step.

> 
> Please take a look at the Xilinx GQSPI controller, which supports both
> use cases, that the dummy cycles can be transferred via tx fifo, or
> generated by the controller automatically. Please read the example
> given in:
> 
>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> Command (EBh)
> 
> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> 
> If you choose to set the m25p80 device property 'model_dummy_bytes' to
> true when working with the Xilinx GQSPI controller, you are bound to
> only allow guest software to use tx fifo to transfer the dummy cycles,
> and this is wrong.
> 
> >
> > >
> > > > for the commands is because no request has been made for them. Also there is
> > > > one controller that has support.
> > >
> > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > "seriously broken" for those case 1 type controllers because they
> > > cannot read anything from the m25p80 model at all. Unless the guest
> > > software being tested only uses Read (03h) command which is not
> > > affected. But I can't find a software that uses Read instead of Fast
> > > Read.
> > >
> > > > > The issue you pointed out that we require the total number of dummy
> > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > if this expectation is not met. However this will not cause any issue
> > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > same assumption as we do here.
> > > > >
> > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > command:
> > > > >
> > > > >     /* convert the dummy cycles to the number of bytes */
> > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > >
> > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > the first place.
> > > > >
> > > > > > >
> > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > flash controllers. There are major two cases:
> > > > > > >
> > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > >   fix flashes working with such controllers.
> > > > > >
> > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > >
> > > > > Please send patches to explain this in detail how this is going to
> > > > > work. I am open to all possible solutions.
> > > >
> > > > In that case I suggest that you instead try with a device property
> > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > >
> > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > >
> > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > removing functionality (features) will take users by surprise.
> > >
> > > I don't think we are removing any features. This is a fix to make the
> > > model to be used by any SPI controllers.
> > >
> > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > assumption for the dummy bit, which is the default configuration for
> > > all flashes I have looked into so far. Can you please comment what use
> > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > response.
> >
> > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > various software components I followed the official doc in [3].
> 
> I see the following QEMU commands are used to test booting U-Boot/Linux:
> 
> $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> -serial stdio -display none -device loader,file=u-boot.elf -kernel
> bl31.elf -device loader,addr=0x40000000,file=Image -device
> loader,addr=0x2000000,file=system.dtb
> 
> I am not sure where the system.dtb gets built from?

It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
will ask you to try a little first before asking for further guidance.

Best regards,
Francisco Iglesias

[1] qemu/docs/system/deprecated.rst
[2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md


> 
> In [3], it mentions the Xilinx QEMU is used. And a different QEMU
> command is used as the example to launch U-Boot which is different
> from your command above.
> 
> See https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841606/QEMU+-+Zynq+UltraScale+MPSoC#QEMU-ZynqUltraScale+MPSoC-RunningaZynqUltraScale+U-bootImageOnXilinx'sARMQEMU
> 
> $ ./aarch64-softmmu/qemu-system-aarch64 -M arm-generic-fdt -serial
> mon:stdio -serial /dev/null -display none \
>   -device loader,addr=0xfd1a0104,data=0x8000000e,data-len=4 \ # Un-reset the A53
>   -device loader,file=./pre-built/linux/images/bl31.elf,cpu-num=0 \ #
> ARM Trusted Firmware
>   -device loader,file=./pre-built/linux/images/u-boot.elf\ # The
> u-boot exectuable
>   -hw-dtb ./pre-built/linux/images/zynqmp-qemu-arm.dtb # HW Device
> Tree that QEMU uses to generate the model
> 
> It is using a machine called "arm-generic-fdt", but in the mainline
> QEMU there is no such machine called "arm-generic-fdt".
> 
> >
> > Best regards,
> > Francisco
> >
> > [1] qemu/docs/system/deprecated.rst
> > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > [3] https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/460653138/Xilinx+Open+Source+Linux
> >
> 
> Regards,
> Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-19 13:01             ` Francisco Iglesias
@ 2021-01-20 14:20               ` Bin Meng
  2021-01-21  8:50                 ` Francisco Iglesias
  0 siblings, 1 reply; 36+ messages in thread
From: Bin Meng @ 2021-01-20 14:20 UTC (permalink / raw)
  To: Francisco Iglesias
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Francisco,

On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > >
> > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > 4-byte address is needed.
> > > > > > > >
> > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > >
> > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > >
> > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > >
> > > > > >
> > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > software generation should have been found out seriously broken long
> > > > > > time ago!
> > > > >
> > > > >
> > > > > The controllers you are referring to might lack support for commands requiring
> > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > >
> > > > I am not sure why you view dummy clock cycles as something special
> > > > that needs some special support from the SPI controller. For the case
> > > > 1 controller, it's nothing special from the controller perspective,
> > > > just like sending out a command, or address bytes, or data. The
> > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > automatically generated (case 2 controller) by the hardware.
> > >
> > > Ok, I'll try to explain my view point a little differently. For that we also
> > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > >
> > > Once functionality has been introduced into QEMU it is not easy to know which
> > > intentional or untentional features provided by the functionality are being
> > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > cycles), but there might be others aswell. So by removing this functionality
> > > above use case will brake, this since those test will not be reliable.
> > > Furthermore, since users tend to be creative it is not possible to know if
> > > there are other use cases that will be affected. This means that in case [1]
> > > needs to be followed the safe path is to add functionality instead of removing.
> > > Luckily it also easier in this case, see below.
> >
> > I understand there might be users other than U-Boot/Linux that use an
> > odd number of dummy bits (not multiple of 8). If your concern was
> > about model behavior changes, sure I can update
> > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > m25p80 model now implement dummy cycles as bytes.
>
> Yes, something like that. My concern is that since this functionality has been
> in tree for while, users have found known or unknown features that got
> introduced by it. By removing the functionality (and the known/uknown features)
> we are riscing to brake our user's use cases (currently I'm aware of one
> feature/use case but it is not unlikely that there are more). [1] states that
> "In general features are intended to be supported indefinitely once introduced
> into QEMU", to me that makes very much sense because the opposite would mean
> that we were not reliable. So in case [1] needs to be honored it looks to be
> safer to add functionality instead of removing (and riscing the removal of use
> cases/features). Luckily I still believe in this case that it will be easier to
> go forward (even if I also agree on what you are saying below about what I
> proposed).
>

Even if the implementation is buggy and we need to keep the buggy
implementation forever? I think that's why
qemu/docs/system/deprecated.rst was created for deprecating such
feature.

> >
> > > >
> > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > probably let the maintainers know about it). Most likely the lack of support
> > > >
> > > > I called it "seriously broken" because current implementation only
> > > > considered one type of SPI controllers while completely ignoring the
> > > > other type.
> > >
> > > If we change view and see this from the perspective of m25p80, it models the
> > > commands a certain way and provides an API that the SPI controllers need to
> > > implement for interacting with it. It is true that there are SPI controllers
> > > referred to above that do not support the portion of that API that corresponds
> > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > broken since there is also one SPI controller that has a working implementation
> > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > will still be honored as in the same time making it possible to have full
> > > support for the API in the SPI controllers that currently do not (please reread
> > > the proposal in my previous reply that attempts to do this). I myself see this
> > > as win/win situation, also because no controller should need modifications.
> > >
> >
> > I am afraid your proposal does not work. Your proposed new device
> > property 'model_dummy_bytes' to select to convert the accurate dummy
> > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > a property to the flash itself, as the behavior is tightly coupled to
> > how the SPI controller works.
>
> I agree on above. I decided though that instead of posting sample code in here
> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>

Wait, (see below)

> >
> > Please take a look at the Xilinx GQSPI controller, which supports both
> > use cases, that the dummy cycles can be transferred via tx fifo, or
> > generated by the controller automatically. Please read the example
> > given in:
> >
> >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > Command (EBh)
> >
> > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> >
> > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > true when working with the Xilinx GQSPI controller, you are bound to
> > only allow guest software to use tx fifo to transfer the dummy cycles,
> > and this is wrong.
> >

You missed this part. I looked at your RFC, and as I mentioned above
your proposal cannot support the complicated controller like Xilinx
GQSPI. Please read the example of table 24-22. With your RFC, you
mandate guest software's GQSPI driver to only use hardware dummy cycle
generation, which is wrong.

> > >
> > > >
> > > > > for the commands is because no request has been made for them. Also there is
> > > > > one controller that has support.
> > > >
> > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > "seriously broken" for those case 1 type controllers because they
> > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > software being tested only uses Read (03h) command which is not
> > > > affected. But I can't find a software that uses Read instead of Fast
> > > > Read.
> > > >
> > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > same assumption as we do here.
> > > > > >
> > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > command:
> > > > > >
> > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > >
> > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > the first place.
> > > > > >
> > > > > > > >
> > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > flash controllers. There are major two cases:
> > > > > > > >
> > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > >   fix flashes working with such controllers.
> > > > > > >
> > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > >
> > > > > > Please send patches to explain this in detail how this is going to
> > > > > > work. I am open to all possible solutions.
> > > > >
> > > > > In that case I suggest that you instead try with a device property
> > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > >
> > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > >
> > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > removing functionality (features) will take users by surprise.
> > > >
> > > > I don't think we are removing any features. This is a fix to make the
> > > > model to be used by any SPI controllers.
> > > >
> > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > assumption for the dummy bit, which is the default configuration for
> > > > all flashes I have looked into so far. Can you please comment what use
> > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > response.
> > >
> > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > various software components I followed the official doc in [3].
> >
> > I see the following QEMU commands are used to test booting U-Boot/Linux:
> >
> > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > loader,addr=0x2000000,file=system.dtb
> >
> > I am not sure where the system.dtb gets built from?
>
> It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> will ask you to try a little first before asking for further guidance.
>

I tried, but no success. I removed the "-device loader" part for
loading kernel image and the device tree, and only focused on booting
U-Boot.

The ATF bl31.elf was built from
https://github.com/ARM-software/arm-trusted-firmware, by following
build instructions at
https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
U-Boot was built from the upstream U-Boot.

$ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
bl31.elf
ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
Found: v0.0
ERROR:   Error initializing runtime service sip_svc

I also tried the Xilinx fork of ATF from
https://github.com/Xilinx/arm-trusted-firmware, by following build
instructions at
https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF

$ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
bl31.elf
ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
NOTICE:  BL31: v2.2(release):xilinx-v2020.2
NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
Found: v0.0
ERROR:   Error initializing runtime service sip_svc

Then I tried to build a U-Boot from the Xilinx fork at
https://github.com/Xilinx/u-boot-xlnx/, still no success.

> Best regards,
> Francisco Iglesias
>
> [1] qemu/docs/system/deprecated.rst
> [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
>
>

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-20 14:20               ` Bin Meng
@ 2021-01-21  8:50                 ` Francisco Iglesias
  2021-01-21  8:59                   ` Bin Meng
  0 siblings, 1 reply; 36+ messages in thread
From: Francisco Iglesias @ 2021-01-21  8:50 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Dear Bin,

On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> Hi Francisco,
> 
> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > >
> > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > 4-byte address is needed.
> > > > > > > > >
> > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > >
> > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > >
> > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > >
> > > > > > >
> > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > software generation should have been found out seriously broken long
> > > > > > > time ago!
> > > > > >
> > > > > >
> > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > >
> > > > > I am not sure why you view dummy clock cycles as something special
> > > > > that needs some special support from the SPI controller. For the case
> > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > just like sending out a command, or address bytes, or data. The
> > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > automatically generated (case 2 controller) by the hardware.
> > > >
> > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > >
> > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > intentional or untentional features provided by the functionality are being
> > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > cycles), but there might be others aswell. So by removing this functionality
> > > > above use case will brake, this since those test will not be reliable.
> > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > there are other use cases that will be affected. This means that in case [1]
> > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > Luckily it also easier in this case, see below.
> > >
> > > I understand there might be users other than U-Boot/Linux that use an
> > > odd number of dummy bits (not multiple of 8). If your concern was
> > > about model behavior changes, sure I can update
> > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > m25p80 model now implement dummy cycles as bytes.
> >
> > Yes, something like that. My concern is that since this functionality has been
> > in tree for while, users have found known or unknown features that got
> > introduced by it. By removing the functionality (and the known/uknown features)
> > we are riscing to brake our user's use cases (currently I'm aware of one
> > feature/use case but it is not unlikely that there are more). [1] states that
> > "In general features are intended to be supported indefinitely once introduced
> > into QEMU", to me that makes very much sense because the opposite would mean
> > that we were not reliable. So in case [1] needs to be honored it looks to be
> > safer to add functionality instead of removing (and riscing the removal of use
> > cases/features). Luckily I still believe in this case that it will be easier to
> > go forward (even if I also agree on what you are saying below about what I
> > proposed).
> >
> 
> Even if the implementation is buggy and we need to keep the buggy
> implementation forever? I think that's why
> qemu/docs/system/deprecated.rst was created for deprecating such
> feature.

With the RFC I posted all commands in m25p80 are working for both the case 1
controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
Because of this, I, with all respect, will have to disagree that this is buggy.

> 
> > >
> > > > >
> > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > >
> > > > > I called it "seriously broken" because current implementation only
> > > > > considered one type of SPI controllers while completely ignoring the
> > > > > other type.
> > > >
> > > > If we change view and see this from the perspective of m25p80, it models the
> > > > commands a certain way and provides an API that the SPI controllers need to
> > > > implement for interacting with it. It is true that there are SPI controllers
> > > > referred to above that do not support the portion of that API that corresponds
> > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > broken since there is also one SPI controller that has a working implementation
> > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > will still be honored as in the same time making it possible to have full
> > > > support for the API in the SPI controllers that currently do not (please reread
> > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > as win/win situation, also because no controller should need modifications.
> > > >
> > >
> > > I am afraid your proposal does not work. Your proposed new device
> > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > a property to the flash itself, as the behavior is tightly coupled to
> > > how the SPI controller works.
> >
> > I agree on above. I decided though that instead of posting sample code in here
> > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> >
> 
> Wait, (see below)
> 
> > >
> > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > generated by the controller automatically. Please read the example
> > > given in:
> > >
> > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > Command (EBh)
> > >
> > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > >
> > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > true when working with the Xilinx GQSPI controller, you are bound to
> > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > and this is wrong.
> > >
> 
> You missed this part. I looked at your RFC, and as I mentioned above
> your proposal cannot support the complicated controller like Xilinx
> GQSPI. Please read the example of table 24-22. With your RFC, you
> mandate guest software's GQSPI driver to only use hardware dummy cycle
> generation, which is wrong.
> 

First, thank you very much for looking into the RFC series, very much
appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
locations in the file, in 1 location the transfer referred to above is done, in
another location the transfer through the txfifo is done. The location where
transfer referred to above is done will not need any modifications (and will
thus work equally well as it does currently).

Now that above has is cleared out, and since I know you are heavily loaded with
other higher prio tasks, lets wait for the maintainers to also have a look into
the RFC (understandibly this can take some time due to that they also are
heavily loaded).

Best regards,
Francisco Iglesias


> > > >
> > > > >
> > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > one controller that has support.
> > > > >
> > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > "seriously broken" for those case 1 type controllers because they
> > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > software being tested only uses Read (03h) command which is not
> > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > Read.
> > > > >
> > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > same assumption as we do here.
> > > > > > >
> > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > command:
> > > > > > >
> > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > >
> > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > the first place.
> > > > > > >
> > > > > > > > >
> > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > >
> > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > >   fix flashes working with such controllers.
> > > > > > > >
> > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > >
> > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > work. I am open to all possible solutions.
> > > > > >
> > > > > > In that case I suggest that you instead try with a device property
> > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > >
> > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > >
> > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > removing functionality (features) will take users by surprise.
> > > > >
> > > > > I don't think we are removing any features. This is a fix to make the
> > > > > model to be used by any SPI controllers.
> > > > >
> > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > assumption for the dummy bit, which is the default configuration for
> > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > response.
> > > >
> > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > various software components I followed the official doc in [3].
> > >
> > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > >
> > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > loader,addr=0x2000000,file=system.dtb
> > >
> > > I am not sure where the system.dtb gets built from?
> >
> > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > will ask you to try a little first before asking for further guidance.
> >
> 
> I tried, but no success. I removed the "-device loader" part for
> loading kernel image and the device tree, and only focused on booting
> U-Boot.
> 
> The ATF bl31.elf was built from
> https://github.com/ARM-software/arm-trusted-firmware, by following
> build instructions at
> https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> U-Boot was built from the upstream U-Boot.
> 
> $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> bl31.elf
> ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> Found: v0.0
> ERROR:   Error initializing runtime service sip_svc
> 
> I also tried the Xilinx fork of ATF from
> https://github.com/Xilinx/arm-trusted-firmware, by following build
> instructions at
> https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> 
> $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> bl31.elf
> ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> Found: v0.0
> ERROR:   Error initializing runtime service sip_svc
> 
> Then I tried to build a U-Boot from the Xilinx fork at
> https://github.com/Xilinx/u-boot-xlnx/, still no success.
> 
> > Best regards,
> > Francisco Iglesias
> >
> > [1] qemu/docs/system/deprecated.rst
> > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> >
> >
> 
> Regards,
> Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-21  8:50                 ` Francisco Iglesias
@ 2021-01-21  8:59                   ` Bin Meng
  2021-01-21 10:01                     ` Francisco Iglesias
  2021-01-21 14:18                     ` Francisco Iglesias
  0 siblings, 2 replies; 36+ messages in thread
From: Bin Meng @ 2021-01-21  8:59 UTC (permalink / raw)
  To: Francisco Iglesias
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Francisco,

On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Dear Bin,
>
> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > >
> > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > 4-byte address is needed.
> > > > > > > > > >
> > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > >
> > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > >
> > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > time ago!
> > > > > > >
> > > > > > >
> > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > >
> > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > that needs some special support from the SPI controller. For the case
> > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > automatically generated (case 2 controller) by the hardware.
> > > > >
> > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > >
> > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > intentional or untentional features provided by the functionality are being
> > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > above use case will brake, this since those test will not be reliable.
> > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > Luckily it also easier in this case, see below.
> > > >
> > > > I understand there might be users other than U-Boot/Linux that use an
> > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > about model behavior changes, sure I can update
> > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > m25p80 model now implement dummy cycles as bytes.
> > >
> > > Yes, something like that. My concern is that since this functionality has been
> > > in tree for while, users have found known or unknown features that got
> > > introduced by it. By removing the functionality (and the known/uknown features)
> > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > feature/use case but it is not unlikely that there are more). [1] states that
> > > "In general features are intended to be supported indefinitely once introduced
> > > into QEMU", to me that makes very much sense because the opposite would mean
> > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > safer to add functionality instead of removing (and riscing the removal of use
> > > cases/features). Luckily I still believe in this case that it will be easier to
> > > go forward (even if I also agree on what you are saying below about what I
> > > proposed).
> > >
> >
> > Even if the implementation is buggy and we need to keep the buggy
> > implementation forever? I think that's why
> > qemu/docs/system/deprecated.rst was created for deprecating such
> > feature.
>
> With the RFC I posted all commands in m25p80 are working for both the case 1
> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> Because of this, I, with all respect, will have to disagree that this is buggy.

Well, the existing m25p80 implementation that uses dummy cycle
accuracy for those flashes prevents all SPI controllers that use tx
fifo to work with those flashes. Hence it is buggy.

>
> >
> > > >
> > > > > >
> > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > >
> > > > > > I called it "seriously broken" because current implementation only
> > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > other type.
> > > > >
> > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > referred to above that do not support the portion of that API that corresponds
> > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > broken since there is also one SPI controller that has a working implementation
> > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > will still be honored as in the same time making it possible to have full
> > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > as win/win situation, also because no controller should need modifications.
> > > > >
> > > >
> > > > I am afraid your proposal does not work. Your proposed new device
> > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > how the SPI controller works.
> > >
> > > I agree on above. I decided though that instead of posting sample code in here
> > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > >
> >
> > Wait, (see below)
> >
> > > >
> > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > generated by the controller automatically. Please read the example
> > > > given in:
> > > >
> > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > Command (EBh)
> > > >
> > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > >
> > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > and this is wrong.
> > > >
> >
> > You missed this part. I looked at your RFC, and as I mentioned above
> > your proposal cannot support the complicated controller like Xilinx
> > GQSPI. Please read the example of table 24-22. With your RFC, you
> > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > generation, which is wrong.
> >
>
> First, thank you very much for looking into the RFC series, very much
> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> locations in the file, in 1 location the transfer referred to above is done, in
> another location the transfer through the txfifo is done. The location where
> transfer referred to above is done will not need any modifications (and will
> thus work equally well as it does currently).

Please explain this a little bit. How does your RFC series handle
cases as described in table 24-22, where the 6 dummy cycles are split
into 2 transfers, with one transfer using tx fifo, and the other one
using hardware dummy cycle generation?

>
> Now that above has is cleared out, and since I know you are heavily loaded with
> other higher prio tasks, lets wait for the maintainers to also have a look into
> the RFC (understandibly this can take some time due to that they also are
> heavily loaded).

Yes, maintainers are pretty much silent on this topic.

However may I ask you to provide more details on my questions below on
booting U-Boot/Linux with the QEMU?

You can post patches to add documentation for zynqmp in
docs/system/arm, or once I get a working instructions, I could do that
too. Much appreciated.

>
> Best regards,
> Francisco Iglesias
>
>
> > > > >
> > > > > >
> > > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > > one controller that has support.
> > > > > >
> > > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > > "seriously broken" for those case 1 type controllers because they
> > > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > > software being tested only uses Read (03h) command which is not
> > > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > > Read.
> > > > > >
> > > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > > same assumption as we do here.
> > > > > > > >
> > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > > command:
> > > > > > > >
> > > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > > >
> > > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > > the first place.
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > > >
> > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > > >   fix flashes working with such controllers.
> > > > > > > > >
> > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > > >
> > > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > > work. I am open to all possible solutions.
> > > > > > >
> > > > > > > In that case I suggest that you instead try with a device property
> > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > > >
> > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > > >
> > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > > removing functionality (features) will take users by surprise.
> > > > > >
> > > > > > I don't think we are removing any features. This is a fix to make the
> > > > > > model to be used by any SPI controllers.
> > > > > >
> > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > > assumption for the dummy bit, which is the default configuration for
> > > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > > response.
> > > > >
> > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > > various software components I followed the official doc in [3].
> > > >
> > > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > > >
> > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > > loader,addr=0x2000000,file=system.dtb
> > > >
> > > > I am not sure where the system.dtb gets built from?
> > >
> > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > > will ask you to try a little first before asking for further guidance.
> > >
> >
> > I tried, but no success. I removed the "-device loader" part for
> > loading kernel image and the device tree, and only focused on booting
> > U-Boot.
> >
> > The ATF bl31.elf was built from
> > https://github.com/ARM-software/arm-trusted-firmware, by following
> > build instructions at
> > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> > U-Boot was built from the upstream U-Boot.
> >
> > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > bl31.elf
> > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> > NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > Found: v0.0
> > ERROR:   Error initializing runtime service sip_svc
> >
> > I also tried the Xilinx fork of ATF from
> > https://github.com/Xilinx/arm-trusted-firmware, by following build
> > instructions at
> > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> >
> > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > bl31.elf
> > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> > NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > Found: v0.0
> > ERROR:   Error initializing runtime service sip_svc
> >
> > Then I tried to build a U-Boot from the Xilinx fork at
> > https://github.com/Xilinx/u-boot-xlnx/, still no success.
> >
> > > Best regards,
> > > Francisco Iglesias
> > >
> > > [1] qemu/docs/system/deprecated.rst
> > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > >

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-21  8:59                   ` Bin Meng
@ 2021-01-21 10:01                     ` Francisco Iglesias
  2021-01-21 14:18                     ` Francisco Iglesias
  1 sibling, 0 replies; 36+ messages in thread
From: Francisco Iglesias @ 2021-01-21 10:01 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Dear Bin,

On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> Hi Francisco,
> 
> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Dear Bin,
> >
> > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > >
> > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > >
> > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > >
> > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > >
> > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > time ago!
> > > > > > > >
> > > > > > > >
> > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > >
> > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > >
> > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > >
> > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > intentional or untentional features provided by the functionality are being
> > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > Luckily it also easier in this case, see below.
> > > > >
> > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > about model behavior changes, sure I can update
> > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > m25p80 model now implement dummy cycles as bytes.
> > > >
> > > > Yes, something like that. My concern is that since this functionality has been
> > > > in tree for while, users have found known or unknown features that got
> > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > "In general features are intended to be supported indefinitely once introduced
> > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > go forward (even if I also agree on what you are saying below about what I
> > > > proposed).
> > > >
> > >
> > > Even if the implementation is buggy and we need to keep the buggy
> > > implementation forever? I think that's why
> > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > feature.
> >
> > With the RFC I posted all commands in m25p80 are working for both the case 1
> > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > Because of this, I, with all respect, will have to disagree that this is buggy.
> 
> Well, the existing m25p80 implementation that uses dummy cycle
> accuracy for those flashes prevents all SPI controllers that use tx
> fifo to work with those flashes. Hence it is buggy.
> 
> >
> > >
> > > > >
> > > > > > >
> > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > >
> > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > other type.
> > > > > >
> > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > will still be honored as in the same time making it possible to have full
> > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > as win/win situation, also because no controller should need modifications.
> > > > > >
> > > > >
> > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > how the SPI controller works.
> > > >
> > > > I agree on above. I decided though that instead of posting sample code in here
> > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > >
> > >
> > > Wait, (see below)
> > >
> > > > >
> > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > generated by the controller automatically. Please read the example
> > > > > given in:
> > > > >
> > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > Command (EBh)
> > > > >
> > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > >
> > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > and this is wrong.
> > > > >
> > >
> > > You missed this part. I looked at your RFC, and as I mentioned above
> > > your proposal cannot support the complicated controller like Xilinx
> > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > generation, which is wrong.
> > >
> >
> > First, thank you very much for looking into the RFC series, very much
> > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > locations in the file, in 1 location the transfer referred to above is done, in
> > another location the transfer through the txfifo is done. The location where
> > transfer referred to above is done will not need any modifications (and will
> > thus work equally well as it does currently).
> 
> Please explain this a little bit. How does your RFC series handle
> cases as described in table 24-22, where the 6 dummy cycles are split
> into 2 transfers, with one transfer using tx fifo, and the other one
> using hardware dummy cycle generation?


Above transfer is already handled in the model, and since it will not change it
will still work afterwards.

About below, sure I'll provide some doc once I get some time over.

Best regards,
Francisco Iglesias


> 
> >
> > Now that above has is cleared out, and since I know you are heavily loaded with
> > other higher prio tasks, lets wait for the maintainers to also have a look into
> > the RFC (understandibly this can take some time due to that they also are
> > heavily loaded).
> 
> Yes, maintainers are pretty much silent on this topic.
> 
> However may I ask you to provide more details on my questions below on
> booting U-Boot/Linux with the QEMU?
> 
> You can post patches to add documentation for zynqmp in
> docs/system/arm, or once I get a working instructions, I could do that
> too. Much appreciated.
> 
> >
> > Best regards,
> > Francisco Iglesias
> >
> >
> > > > > >
> > > > > > >
> > > > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > > > one controller that has support.
> > > > > > >
> > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > > > "seriously broken" for those case 1 type controllers because they
> > > > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > > > software being tested only uses Read (03h) command which is not
> > > > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > > > Read.
> > > > > > >
> > > > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > > > same assumption as we do here.
> > > > > > > > >
> > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > > > command:
> > > > > > > > >
> > > > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > > > >
> > > > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > > > the first place.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > > > >
> > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > > > >   fix flashes working with such controllers.
> > > > > > > > > >
> > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > > > >
> > > > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > > > work. I am open to all possible solutions.
> > > > > > > >
> > > > > > > > In that case I suggest that you instead try with a device property
> > > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > > > >
> > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > > > >
> > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > > > removing functionality (features) will take users by surprise.
> > > > > > >
> > > > > > > I don't think we are removing any features. This is a fix to make the
> > > > > > > model to be used by any SPI controllers.
> > > > > > >
> > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > > > assumption for the dummy bit, which is the default configuration for
> > > > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > > > response.
> > > > > >
> > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > > > various software components I followed the official doc in [3].
> > > > >
> > > > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > > > >
> > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > > > loader,addr=0x2000000,file=system.dtb
> > > > >
> > > > > I am not sure where the system.dtb gets built from?
> > > >
> > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > > > will ask you to try a little first before asking for further guidance.
> > > >
> > >
> > > I tried, but no success. I removed the "-device loader" part for
> > > loading kernel image and the device tree, and only focused on booting
> > > U-Boot.
> > >
> > > The ATF bl31.elf was built from
> > > https://github.com/ARM-software/arm-trusted-firmware, by following
> > > build instructions at
> > > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> > > U-Boot was built from the upstream U-Boot.
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> > > NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > I also tried the Xilinx fork of ATF from
> > > https://github.com/Xilinx/arm-trusted-firmware, by following build
> > > instructions at
> > > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> > > NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > Then I tried to build a U-Boot from the Xilinx fork at
> > > https://github.com/Xilinx/u-boot-xlnx/, still no success.
> > >
> > > > Best regards,
> > > > Francisco Iglesias
> > > >
> > > > [1] qemu/docs/system/deprecated.rst
> > > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > > >
> 
> Regards,
> Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-21  8:59                   ` Bin Meng
  2021-01-21 10:01                     ` Francisco Iglesias
@ 2021-01-21 14:18                     ` Francisco Iglesias
  2021-02-08 14:41                       ` Bin Meng
  1 sibling, 1 reply; 36+ messages in thread
From: Francisco Iglesias @ 2021-01-21 14:18 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

Hi Bin,

On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> Hi Francisco,
> 
> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Dear Bin,
> >
> > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > >
> > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > >
> > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > >
> > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > >
> > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > time ago!
> > > > > > > >
> > > > > > > >
> > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > >
> > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > >
> > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > >
> > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > intentional or untentional features provided by the functionality are being
> > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > Luckily it also easier in this case, see below.
> > > > >
> > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > about model behavior changes, sure I can update
> > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > m25p80 model now implement dummy cycles as bytes.
> > > >
> > > > Yes, something like that. My concern is that since this functionality has been
> > > > in tree for while, users have found known or unknown features that got
> > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > "In general features are intended to be supported indefinitely once introduced
> > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > go forward (even if I also agree on what you are saying below about what I
> > > > proposed).
> > > >
> > >
> > > Even if the implementation is buggy and we need to keep the buggy
> > > implementation forever? I think that's why
> > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > feature.
> >
> > With the RFC I posted all commands in m25p80 are working for both the case 1
> > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > Because of this, I, with all respect, will have to disagree that this is buggy.
> 
> Well, the existing m25p80 implementation that uses dummy cycle
> accuracy for those flashes prevents all SPI controllers that use tx
> fifo to work with those flashes. Hence it is buggy.
> 
> >
> > >
> > > > >
> > > > > > >
> > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > >
> > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > other type.
> > > > > >
> > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > will still be honored as in the same time making it possible to have full
> > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > as win/win situation, also because no controller should need modifications.
> > > > > >
> > > > >
> > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > how the SPI controller works.
> > > >
> > > > I agree on above. I decided though that instead of posting sample code in here
> > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > >
> > >
> > > Wait, (see below)
> > >
> > > > >
> > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > generated by the controller automatically. Please read the example
> > > > > given in:
> > > > >
> > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > Command (EBh)
> > > > >
> > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > >
> > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > and this is wrong.
> > > > >
> > >
> > > You missed this part. I looked at your RFC, and as I mentioned above
> > > your proposal cannot support the complicated controller like Xilinx
> > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > generation, which is wrong.
> > >
> >
> > First, thank you very much for looking into the RFC series, very much
> > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > locations in the file, in 1 location the transfer referred to above is done, in
> > another location the transfer through the txfifo is done. The location where
> > transfer referred to above is done will not need any modifications (and will
> > thus work equally well as it does currently).
> 
> Please explain this a little bit. How does your RFC series handle
> cases as described in table 24-22, where the 6 dummy cycles are split
> into 2 transfers, with one transfer using tx fifo, and the other one
> using hardware dummy cycle generation?

Sorry, I missunderstod. You are right, that won't work.

Best regards,
Francisco Iglesias

> 
> >
> > Now that above has is cleared out, and since I know you are heavily loaded with
> > other higher prio tasks, lets wait for the maintainers to also have a look into
> > the RFC (understandibly this can take some time due to that they also are
> > heavily loaded).
> 
> Yes, maintainers are pretty much silent on this topic.
> 
> However may I ask you to provide more details on my questions below on
> booting U-Boot/Linux with the QEMU?
> 
> You can post patches to add documentation for zynqmp in
> docs/system/arm, or once I get a working instructions, I could do that
> too. Much appreciated.
> 
> >
> > Best regards,
> > Francisco Iglesias
> >
> >
> > > > > >
> > > > > > >
> > > > > > > > for the commands is because no request has been made for them. Also there is
> > > > > > > > one controller that has support.
> > > > > > >
> > > > > > > Definitely it's not "no request". Nearly all SPI flashes support the
> > > > > > > Fast Read (0Bh) command today, and 0Bh requires a dummy cycle. This is
> > > > > > > "seriously broken" for those case 1 type controllers because they
> > > > > > > cannot read anything from the m25p80 model at all. Unless the guest
> > > > > > > software being tested only uses Read (03h) command which is not
> > > > > > > affected. But I can't find a software that uses Read instead of Fast
> > > > > > > Read.
> > > > > > >
> > > > > > > > > The issue you pointed out that we require the total number of dummy
> > > > > > > > > bits should be multiple of 8 is true, that's why I added the
> > > > > > > > > unimplemented log message in this series (patch 2/3/4) to warn users
> > > > > > > > > if this expectation is not met. However this will not cause any issue
> > > > > > > > > when running U-Boot or Linux, because both spi-nor drivers expect the
> > > > > > > > > same assumption as we do here.
> > > > > > > > >
> > > > > > > > > See U-Boot spi_nor_read_data() and Linux spi_nor_spimem_read_data(),
> > > > > > > > > there is a logic to calculate the dummy bytes needed for fast read
> > > > > > > > > command:
> > > > > > > > >
> > > > > > > > >     /* convert the dummy cycles to the number of bytes */
> > > > > > > > >     op.dummy.nbytes = (nor->read_dummy * op.dummy.buswidth) / 8;
> > > > > > > > >
> > > > > > > > > Note the default dummy cycles configuration for all flashes I have
> > > > > > > > > looked into as of today, meets the multiple of 8 assumption. On some
> > > > > > > > > flashes the dummy cycle number is configurable, and if it's been
> > > > > > > > > configured to be an odd value, it would not work on U-Boot/Linux in
> > > > > > > > > the first place.
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Things get complicated when interacting with different SPI or QSPI
> > > > > > > > > > > flash controllers. There are major two cases:
> > > > > > > > > > >
> > > > > > > > > > > - Dummy bytes prepared by drivers, and wrote to the controller fifo.
> > > > > > > > > > >   For such case, driver will calculate the correct number of dummy
> > > > > > > > > > >   bytes and write them into the tx fifo. Fixing the m25p80 model will
> > > > > > > > > > >   fix flashes working with such controllers.
> > > > > > > > > >
> > > > > > > > > > Above can be fixed while still keeping the detailed dummy cycle implementation
> > > > > > > > > > inside m25p80. Perhaps one of the following could be looked into: configurating
> > > > > > > > > > the amount, letting the spi ctrl fetch the amount from m25p80 or by inheriting
> > > > > > > > > > some functionality handling this in the SPI controller. Or a mixture of above.
> > > > > > > > >
> > > > > > > > > Please send patches to explain this in detail how this is going to
> > > > > > > > > work. I am open to all possible solutions.
> > > > > > > >
> > > > > > > > In that case I suggest that you instead try with a device property
> > > > > > > > 'model_dummy_bytes' used to select to convert the accurate dummy clock cycle
> > > > > > > > count to dummy bytes inside m25p80. Below is an example on how to modify the
> > > > > > >
> > > > > > > No this is wrong in my view. This is not like a DMA vs. PIO handling.
> > > > > > >
> > > > > > > > decode_fast_read_cmd function (the other commands requiring dummy clock cycles
> > > > > > > > can follow a similar pattern). This way the fifo mode will be able to work the
> > > > > > > > way you desire while also keeping the current functionality intact. Suddenly
> > > > > > > > removing functionality (features) will take users by surprise.
> > > > > > >
> > > > > > > I don't think we are removing any features. This is a fix to make the
> > > > > > > model to be used by any SPI controllers.
> > > > > > >
> > > > > > > As I pointed out, both U-Boot and Linux have the multiple of 8
> > > > > > > assumption for the dummy bit, which is the default configuration for
> > > > > > > all flashes I have looked into so far. Can you please comment what use
> > > > > > > case you want to support? I requested a U-Boot/Linux kernel testing in
> > > > > > > the previous SST thread [1] against Xilinx GQSPI but there was no
> > > > > > > response.
> > > > > >
> > > > > > In [2] instructions on how to boot u-boot/Linux is found. For building the
> > > > > > various software components I followed the official doc in [3].
> > > > >
> > > > > I see the following QEMU commands are used to test booting U-Boot/Linux:
> > > > >
> > > > > $ qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m 4G
> > > > > -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > > > bl31.elf -device loader,addr=0x40000000,file=Image -device
> > > > > loader,addr=0x2000000,file=system.dtb
> > > > >
> > > > > I am not sure where the system.dtb gets built from?
> > > >
> > > > It is the instructions in [2] to look into. 'system.dtb' is the kernel dtb for
> > > > zcu102 ([2] has been fixed). I created [2] purely for you, so respectfully I
> > > > will ask you to try a little first before asking for further guidance.
> > > >
> > >
> > > I tried, but no success. I removed the "-device loader" part for
> > > loading kernel image and the device tree, and only focused on booting
> > > U-Boot.
> > >
> > > The ATF bl31.elf was built from
> > > https://github.com/ARM-software/arm-trusted-firmware, by following
> > > build instructions at
> > > https://trustedfirmware-a.readthedocs.io/en/latest/plat/xilinx-zynqmp.html.
> > > U-Boot was built from the upstream U-Boot.
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.4(release):v2.4-228-g337e493
> > > NOTICE:  BL31: Built : 21:18:14, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > I also tried the Xilinx fork of ATF from
> > > https://github.com/Xilinx/arm-trusted-firmware, by following build
> > > instructions at
> > > https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842305/Build+ARM+Trusted+Firmware+ATF
> > >
> > > $ ./qemu-system-aarch64 -M xlnx-zcu102,secure=on,virtualization=on -m
> > > 4G -serial stdio -display none -device loader,file=u-boot.elf -kernel
> > > bl31.elf
> > > ERROR:   Incorrect XILINX IDCODE 0x0, maskid 0x4600093
> > > NOTICE:  ATF running on XCZUUNKN/silicon v1/RTL0.0 at 0xfffea000
> > > NOTICE:  BL31: v2.2(release):xilinx-v2020.2
> > > NOTICE:  BL31: Built : 21:52:38, Jan 20 2021
> > > ERROR:   BL31: Platform Management API version error. Expected: v1.1 -
> > > Found: v0.0
> > > ERROR:   Error initializing runtime service sip_svc
> > >
> > > Then I tried to build a U-Boot from the Xilinx fork at
> > > https://github.com/Xilinx/u-boot-xlnx/, still no success.
> > >
> > > > Best regards,
> > > > Francisco Iglesias
> > > >
> > > > [1] qemu/docs/system/deprecated.rst
> > > > [2] https://github.com/franciscoIglesias/qemu-cmdline/blob/master/xlnx-zcu102-atf-u-boot-linux.md
> > > >
> 
> Regards,
> Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-01-21 14:18                     ` Francisco Iglesias
@ 2021-02-08 14:41                       ` Bin Meng
  2021-02-08 15:30                         ` Edgar E. Iglesias
  2021-04-23  6:45                         ` Bin Meng
  0 siblings, 2 replies; 36+ messages in thread
From: Bin Meng @ 2021-02-08 14:41 UTC (permalink / raw)
  To: Francisco Iglesias, Edgar E. Iglesias, Alistair Francis
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Cédric Le Goater,
	Joe Komlodi, Max Reitz, Joel Stanley

On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
<frasse.iglesias@gmail.com> wrote:
>
> Hi Bin,
>
> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > Hi Francisco,
> >
> > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Dear Bin,
> > >
> > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Hi Bin,
> > > > >
> > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > Hi Francisco,
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Bin,
> > > > > > > > > > >
> > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > >
> > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > >
> > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > >
> > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > >
> > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > time ago!
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > >
> > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > >
> > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > >
> > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > Luckily it also easier in this case, see below.
> > > > > >
> > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > about model behavior changes, sure I can update
> > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > >
> > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > in tree for while, users have found known or unknown features that got
> > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > proposed).
> > > > >
> > > >
> > > > Even if the implementation is buggy and we need to keep the buggy
> > > > implementation forever? I think that's why
> > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > feature.
> > >
> > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > Because of this, I, with all respect, will have to disagree that this is buggy.
> >
> > Well, the existing m25p80 implementation that uses dummy cycle
> > accuracy for those flashes prevents all SPI controllers that use tx
> > fifo to work with those flashes. Hence it is buggy.
> >
> > >
> > > >
> > > > > >
> > > > > > > >
> > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > >
> > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > other type.
> > > > > > >
> > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > >
> > > > > >
> > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > how the SPI controller works.
> > > > >
> > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > >
> > > >
> > > > Wait, (see below)
> > > >
> > > > > >
> > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > generated by the controller automatically. Please read the example
> > > > > > given in:
> > > > > >
> > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > Command (EBh)
> > > > > >
> > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > >
> > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > and this is wrong.
> > > > > >
> > > >
> > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > your proposal cannot support the complicated controller like Xilinx
> > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > generation, which is wrong.
> > > >
> > >
> > > First, thank you very much for looking into the RFC series, very much
> > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > locations in the file, in 1 location the transfer referred to above is done, in
> > > another location the transfer through the txfifo is done. The location where
> > > transfer referred to above is done will not need any modifications (and will
> > > thus work equally well as it does currently).
> >
> > Please explain this a little bit. How does your RFC series handle
> > cases as described in table 24-22, where the 6 dummy cycles are split
> > into 2 transfers, with one transfer using tx fifo, and the other one
> > using hardware dummy cycle generation?
>
> Sorry, I missunderstod. You are right, that won't work.

+Edgar E. Iglesias

So it looks by far the only way to implement dummy cycles correctly to
work with all SPI controller models is what I proposed here in this
patch series.

Maintainers are quite silent, so I would like to hear your thoughts.

@Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
please share your thoughts since you are the one who reviewed the
existing dummy implementation (based on commits history)

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-02-08 14:41                       ` Bin Meng
@ 2021-02-08 15:30                         ` Edgar E. Iglesias
  2021-02-09  9:35                           ` Francisco Iglesias
  2021-04-23  6:45                         ` Bin Meng
  1 sibling, 1 reply; 36+ messages in thread
From: Edgar E. Iglesias @ 2021-02-08 15:30 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Francisco Iglesias, Bin Meng,
	Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Max Reitz, Joel Stanley

[-- Attachment #1: Type: text/plain, Size: 14688 bytes --]

On Mon, Feb 8, 2021 at 3:42 PM Bin Meng <bmeng.cn@gmail.com> wrote:

> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Dear Bin,
> > > >
> > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > Hi Francisco,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > >
> > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate
> how many follow-up
> > > > > > > > > > > > > bytes are expected to be received after it
> receives a command. For
> > > > > > > > > > > > > example, depending on the address mode, either
> 3-byte address or
> > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For fast read family commands, some dummy cycles
> are required after
> > > > > > > > > > > > > sending the address bytes, and the dummy cycles
> need to be counted
> > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the
> unit is in byte.
> > > > > > > > > > > > > It is not in bit, or cycle. However for some
> reason the model has
> > > > > > > > > > > > > been using the number of dummy cycles for
> s->needed_bytes. The right
> > > > > > > > > > > > > approach is to convert the number of dummy cycles
> to bytes based on
> > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for
> the Fast Read Quad
> > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the
> formula (6 * 4 / 8).
> > > > > > > > > > > >
> > > > > > > > > > > > While not being the original implementor I must
> assume that above solution was
> > > > > > > > > > > > considered but not chosen by the developers due to
> it is inaccuracy (it
> > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles,
> only a multiple of 8,
> > > > > > > > > > > > meaning that if the controller is wrongly programmed
> to generate 7 the error
> > > > > > > > > > > > wouldn't be caught and the controller will still be
> considered "correct"). Now
> > > > > > > > > > > > that we have this detail in the implementation I'm
> in favor of keeping it, this
> > > > > > > > > > > > also because the detail is already in use for
> catching exactly above error.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I found no clue from the commit message that my
> proposed solution here
> > > > > > > > > > > was ever considered, otherwise all SPI controller
> models supporting
> > > > > > > > > > > software generation should have been found out
> seriously broken long
> > > > > > > > > > > time ago!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The controllers you are referring to might lack support
> for commands requiring
> > > > > > > > > > dummy clock cycles but I really hope they work with the
> other commands? If so I
> > > > > > > > >
> > > > > > > > > I am not sure why you view dummy clock cycles as something
> special
> > > > > > > > > that needs some special support from the SPI controller.
> For the case
> > > > > > > > > 1 controller, it's nothing special from the controller
> perspective,
> > > > > > > > > just like sending out a command, or address bytes, or
> data. The
> > > > > > > > > controller just shifts data bit by bit from its tx fifo
> and that's it.
> > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can
> either be
> > > > > > > > > sent via a regular data (the case 1 controller) in the tx
> fifo, or
> > > > > > > > > automatically generated (case 2 controller) by the
> hardware.
> > > > > > > >
> > > > > > > > Ok, I'll try to explain my view point a little differently.
> For that we also
> > > > > > > > need to keep in mind that QEMU models HW, and any binary
> that runs on a HW
> > > > > > > > board supported in QEMU should ideally run on that board
> inside QEMU aswell
> > > > > > > > (this can be a bare metal application equaly well as a
> modified u-boot/Linux
> > > > > > > > using SPI commands with a non multiple of 8 number of dummy
> clock cycles).
> > > > > > > >
> > > > > > > > Once functionality has been introduced into QEMU it is not
> easy to know which
> > > > > > > > intentional or untentional features provided by the
> functionality are being
> > > > > > > > used by users. One of the (perhaps not well known) features
> I'm aware of that
> > > > > > > > is in use and is provided by the accurate dummy clock cycle
> modeling inside
> > > > > > > > m25p80 is the be ability to test drivers accurately
> regarding the dummy clock
> > > > > > > > cycles (even when using commands with a non-multiple of 8
> number of dummy clock
> > > > > > > > cycles), but there might be others aswell. So by removing
> this functionality
> > > > > > > > above use case will brake, this since those test will not be
> reliable.
> > > > > > > > Furthermore, since users tend to be creative it is not
> possible to know if
> > > > > > > > there are other use cases that will be affected. This means
> that in case [1]
> > > > > > > > needs to be followed the safe path is to add functionality
> instead of removing.
> > > > > > > > Luckily it also easier in this case, see below.
> > > > > > >
> > > > > > > I understand there might be users other than U-Boot/Linux that
> use an
> > > > > > > odd number of dummy bits (not multiple of 8). If your concern
> was
> > > > > > > about model behavior changes, sure I can update
> > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes
> in the
> > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > >
> > > > > > Yes, something like that. My concern is that since this
> functionality has been
> > > > > > in tree for while, users have found known or unknown features
> that got
> > > > > > introduced by it. By removing the functionality (and the
> known/uknown features)
> > > > > > we are riscing to brake our user's use cases (currently I'm
> aware of one
> > > > > > feature/use case but it is not unlikely that there are more).
> [1] states that
> > > > > > "In general features are intended to be supported indefinitely
> once introduced
> > > > > > into QEMU", to me that makes very much sense because the
> opposite would mean
> > > > > > that we were not reliable. So in case [1] needs to be honored it
> looks to be
> > > > > > safer to add functionality instead of removing (and riscing the
> removal of use
> > > > > > cases/features). Luckily I still believe in this case that it
> will be easier to
> > > > > > go forward (even if I also agree on what you are saying below
> about what I
> > > > > > proposed).
> > > > > >
> > > > >
> > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > implementation forever? I think that's why
> > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > feature.
> > > >
> > > > With the RFC I posted all commands in m25p80 are working for both
> the case 1
> > > > controller (using a txfifo) and the case 2 controller (no txfifo, as
> GQSPI).
> > > > Because of this, I, with all respect, will have to disagree that
> this is buggy.
> > >
> > > Well, the existing m25p80 implementation that uses dummy cycle
> > > accuracy for those flashes prevents all SPI controllers that use tx
> > > fifo to work with those flashes. Hence it is buggy.
> > >
> > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > don't think it is fair to call them 'seriously broken'
> (and else we should
> > > > > > > > > > probably let the maintainers know about it). Most likely
> the lack of support
> > > > > > > > >
> > > > > > > > > I called it "seriously broken" because current
> implementation only
> > > > > > > > > considered one type of SPI controllers while completely
> ignoring the
> > > > > > > > > other type.
> > > > > > > >
> > > > > > > > If we change view and see this from the perspective of
> m25p80, it models the
> > > > > > > > commands a certain way and provides an API that the SPI
> controllers need to
> > > > > > > > implement for interacting with it. It is true that there are
> SPI controllers
> > > > > > > > referred to above that do not support the portion of that
> API that corresponds
> > > > > > > > to commands with dummy clock cycles, but I don't think it is
> true that this is
> > > > > > > > broken since there is also one SPI controller that has a
> working implementation
> > > > > > > > of m25p80's full API also when transfering through a tx fifo
> (use case 1). But
> > > > > > > > as mentioned above, by doing a minor extension and
> improvement to m25p80's API
> > > > > > > > and allow for toggling the accuracy from dummy clock cycles
> to dummy bytes [1]
> > > > > > > > will still be honored as in the same time making it possible
> to have full
> > > > > > > > support for the API in the SPI controllers that currently do
> not (please reread
> > > > > > > > the proposal in my previous reply that attempts to do this).
> I myself see this
> > > > > > > > as win/win situation, also because no controller should need
> modifications.
> > > > > > > >
> > > > > > >
> > > > > > > I am afraid your proposal does not work. Your proposed new
> device
> > > > > > > property 'model_dummy_bytes' to select to convert the accurate
> dummy
> > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to
> justify as
> > > > > > > a property to the flash itself, as the behavior is tightly
> coupled to
> > > > > > > how the SPI controller works.
> > > > > >
> > > > > > I agree on above. I decided though that instead of posting
> sample code in here
> > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc
> you. About below,
> > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first
> step.
> > > > > >
> > > > >
> > > > > Wait, (see below)
> > > > >
> > > > > > >
> > > > > > > Please take a look at the Xilinx GQSPI controller, which
> supports both
> > > > > > > use cases, that the dummy cycles can be transferred via tx
> fifo, or
> > > > > > > generated by the controller automatically. Please read the
> example
> > > > > > > given in:
> > > > > > >
> > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad
> I/O Read
> > > > > > > Command (EBh)
> > > > > > >
> > > > > > > in
> https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > >
> > > > > > > If you choose to set the m25p80 device property
> 'model_dummy_bytes' to
> > > > > > > true when working with the Xilinx GQSPI controller, you are
> bound to
> > > > > > > only allow guest software to use tx fifo to transfer the dummy
> cycles,
> > > > > > > and this is wrong.
> > > > > > >
> > > > >
> > > > > You missed this part. I looked at your RFC, and as I mentioned
> above
> > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > mandate guest software's GQSPI driver to only use hardware dummy
> cycle
> > > > > generation, which is wrong.
> > > > >
> > > >
> > > > First, thank you very much for looking into the RFC series, very much
> > > > appreciated. Secondly, about above, the GQSPI model in QEMU
> transfers from 2
> > > > locations in the file, in 1 location the transfer referred to above
> is done, in
> > > > another location the transfer through the txfifo is done. The
> location where
> > > > transfer referred to above is done will not need any modifications
> (and will
> > > > thus work equally well as it does currently).
> > >
> > > Please explain this a little bit. How does your RFC series handle
> > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > using hardware dummy cycle generation?
> >
> > Sorry, I missunderstod. You are right, that won't work.
>
> +Edgar E. Iglesias
>
> So it looks by far the only way to implement dummy cycles correctly to
> work with all SPI controller models is what I proposed here in this
> patch series.
>
> Maintainers are quite silent, so I would like to hear your thoughts.
>
> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> please share your thoughts since you are the one who reviewed the
> existing dummy implementation (based on commits history)
>
>
Francisco really knows this stuff better than me....
I would tend to agree that it's unfortunate to model things in cycles, if
we could abstract things at a higher level that would be nice. Without
breaking existing use-cases.
Francisco, is it impossible to bring up the abstraction level to bytes and
keep existing use-cases?

We have a bunch of test-cases, We'll publish some of them in source code,
others we can't publish since they use proprietary SW we're not allowed to
publish at all, but we can run tests and Ack if things work.

Best regards,
Edgar

[-- Attachment #2: Type: text/html, Size: 20385 bytes --]

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-02-08 15:30                         ` Edgar E. Iglesias
@ 2021-02-09  9:35                           ` Francisco Iglesias
  0 siblings, 0 replies; 36+ messages in thread
From: Francisco Iglesias @ 2021-02-09  9:35 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, Max Reitz, Alistair Francis,
	Cédric Le Goater, Joe Komlodi, Bin Meng, qemu-arm,
	Joel Stanley

Hello Edgar,

On [2021 Feb 08] Mon 16:30:00, Edgar E. Iglesias wrote:
>    On Mon, Feb 8, 2021 at 3:42 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> 
>      On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>      <frasse.iglesias@gmail.com> wrote:
>      >
>      > Hi Bin,
>      >
>      > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>      > > Hi Francisco,
>      > >
>      > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>      > > <frasse.iglesias@gmail.com> wrote:
>      > > >
>      > > > Dear Bin,
>      > > >
>      > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>      > > > > Hi Francisco,
>      > > > >
>      > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>      > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > >
>      > > > > > Hi Bin,
>      > > > > >
>      > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>      > > > > > > Hi Francisco,
>      > > > > > >
>      > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>      > > > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > > > >
>      > > > > > > > Hi Bin,
>      > > > > > > >
>      > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>      > > > > > > > > Hi Francisco,
>      > > > > > > > >
>      > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>      > > > > > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > > > > > >
>      > > > > > > > > > Hi Bin,
>      > > > > > > > > >
>      > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>      > > > > > > > > > > Hi Francisco,
>      > > > > > > > > > >
>      > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>      > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
>      > > > > > > > > > > >
>      > > > > > > > > > > > Hi Bin,
>      > > > > > > > > > > >
>      > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>      > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
>      > > > > > > > > > > > >
>      > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to
>      indicate how many follow-up
>      > > > > > > > > > > > > bytes are expected to be received after it
>      receives a command. For
>      > > > > > > > > > > > > example, depending on the address mode, either
>      3-byte address or
>      > > > > > > > > > > > > 4-byte address is needed.
>      > > > > > > > > > > > >
>      > > > > > > > > > > > > For fast read family commands, some dummy cycles
>      are required after
>      > > > > > > > > > > > > sending the address bytes, and the dummy cycles
>      need to be counted
>      > > > > > > > > > > > > in s->needed_bytes. This is where the mess
>      began.
>      > > > > > > > > > > > >
>      > > > > > > > > > > > > As the variable name (needed_bytes) indicates,
>      the unit is in byte.
>      > > > > > > > > > > > > It is not in bit, or cycle. However for some
>      reason the model has
>      > > > > > > > > > > > > been using the number of dummy cycles for
>      s->needed_bytes. The right
>      > > > > > > > > > > > > approach is to convert the number of dummy
>      cycles to bytes based on
>      > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles
>      for the Fast Read Quad
>      > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the
>      formula (6 * 4 / 8).
>      > > > > > > > > > > >
>      > > > > > > > > > > > While not being the original implementor I must
>      assume that above solution was
>      > > > > > > > > > > > considered but not chosen by the developers due to
>      it is inaccuracy (it
>      > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy
>      cycles, only a multiple of 8,
>      > > > > > > > > > > > meaning that if the controller is wrongly
>      programmed to generate 7 the error
>      > > > > > > > > > > > wouldn't be caught and the controller will still
>      be considered "correct"). Now
>      > > > > > > > > > > > that we have this detail in the implementation I'm
>      in favor of keeping it, this
>      > > > > > > > > > > > also because the detail is already in use for
>      catching exactly above error.
>      > > > > > > > > > > >
>      > > > > > > > > > >
>      > > > > > > > > > > I found no clue from the commit message that my
>      proposed solution here
>      > > > > > > > > > > was ever considered, otherwise all SPI controller
>      models supporting
>      > > > > > > > > > > software generation should have been found out
>      seriously broken long
>      > > > > > > > > > > time ago!
>      > > > > > > > > >
>      > > > > > > > > >
>      > > > > > > > > > The controllers you are referring to might lack
>      support for commands requiring
>      > > > > > > > > > dummy clock cycles but I really hope they work with
>      the other commands? If so I
>      > > > > > > > >
>      > > > > > > > > I am not sure why you view dummy clock cycles as
>      something special
>      > > > > > > > > that needs some special support from the SPI controller.
>      For the case
>      > > > > > > > > 1 controller, it's nothing special from the controller
>      perspective,
>      > > > > > > > > just like sending out a command, or address bytes, or
>      data. The
>      > > > > > > > > controller just shifts data bit by bit from its tx fifo
>      and that's it.
>      > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles
>      can either be
>      > > > > > > > > sent via a regular data (the case 1 controller) in the
>      tx fifo, or
>      > > > > > > > > automatically generated (case 2 controller) by the
>      hardware.
>      > > > > > > >
>      > > > > > > > Ok, I'll try to explain my view point a little
>      differently. For that we also
>      > > > > > > > need to keep in mind that QEMU models HW, and any binary
>      that runs on a HW
>      > > > > > > > board supported in QEMU should ideally run on that board
>      inside QEMU aswell
>      > > > > > > > (this can be a bare metal application equaly well as a
>      modified u-boot/Linux
>      > > > > > > > using SPI commands with a non multiple of 8 number of
>      dummy clock cycles).
>      > > > > > > >
>      > > > > > > > Once functionality has been introduced into QEMU it is not
>      easy to know which
>      > > > > > > > intentional or untentional features provided by the
>      functionality are being
>      > > > > > > > used by users. One of the (perhaps not well known)
>      features I'm aware of that
>      > > > > > > > is in use and is provided by the accurate dummy clock
>      cycle modeling inside
>      > > > > > > > m25p80 is the be ability to test drivers accurately
>      regarding the dummy clock
>      > > > > > > > cycles (even when using commands with a non-multiple of 8
>      number of dummy clock
>      > > > > > > > cycles), but there might be others aswell. So by removing
>      this functionality
>      > > > > > > > above use case will brake, this since those test will not
>      be reliable.
>      > > > > > > > Furthermore, since users tend to be creative it is not
>      possible to know if
>      > > > > > > > there are other use cases that will be affected. This
>      means that in case [1]
>      > > > > > > > needs to be followed the safe path is to add functionality
>      instead of removing.
>      > > > > > > > Luckily it also easier in this case, see below.
>      > > > > > >
>      > > > > > > I understand there might be users other than U-Boot/Linux
>      that use an
>      > > > > > > odd number of dummy bits (not multiple of 8). If your
>      concern was
>      > > > > > > about model behavior changes, sure I can update
>      > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes
>      in the
>      > > > > > > m25p80 model now implement dummy cycles as bytes.
>      > > > > >
>      > > > > > Yes, something like that. My concern is that since this
>      functionality has been
>      > > > > > in tree for while, users have found known or unknown features
>      that got
>      > > > > > introduced by it. By removing the functionality (and the
>      known/uknown features)
>      > > > > > we are riscing to brake our user's use cases (currently I'm
>      aware of one
>      > > > > > feature/use case but it is not unlikely that there are more).
>      [1] states that
>      > > > > > "In general features are intended to be supported indefinitely
>      once introduced
>      > > > > > into QEMU", to me that makes very much sense because the
>      opposite would mean
>      > > > > > that we were not reliable. So in case [1] needs to be honored
>      it looks to be
>      > > > > > safer to add functionality instead of removing (and riscing
>      the removal of use
>      > > > > > cases/features). Luckily I still believe in this case that it
>      will be easier to
>      > > > > > go forward (even if I also agree on what you are saying below
>      about what I
>      > > > > > proposed).
>      > > > > >
>      > > > >
>      > > > > Even if the implementation is buggy and we need to keep the
>      buggy
>      > > > > implementation forever? I think that's why
>      > > > > qemu/docs/system/deprecated.rst was created for deprecating such
>      > > > > feature.
>      > > >
>      > > > With the RFC I posted all commands in m25p80 are working for both
>      the case 1
>      > > > controller (using a txfifo) and the case 2 controller (no txfifo,
>      as GQSPI).
>      > > > Because of this, I, with all respect, will have to disagree that
>      this is buggy.
>      > >
>      > > Well, the existing m25p80 implementation that uses dummy cycle
>      > > accuracy for those flashes prevents all SPI controllers that use tx
>      > > fifo to work with those flashes. Hence it is buggy.
>      > >
>      > > >
>      > > > >
>      > > > > > >
>      > > > > > > > >
>      > > > > > > > > > don't think it is fair to call them 'seriously broken'
>      (and else we should
>      > > > > > > > > > probably let the maintainers know about it). Most
>      likely the lack of support
>      > > > > > > > >
>      > > > > > > > > I called it "seriously broken" because current
>      implementation only
>      > > > > > > > > considered one type of SPI controllers while completely
>      ignoring the
>      > > > > > > > > other type.
>      > > > > > > >
>      > > > > > > > If we change view and see this from the perspective of
>      m25p80, it models the
>      > > > > > > > commands a certain way and provides an API that the SPI
>      controllers need to
>      > > > > > > > implement for interacting with it. It is true that there
>      are SPI controllers
>      > > > > > > > referred to above that do not support the portion of that
>      API that corresponds
>      > > > > > > > to commands with dummy clock cycles, but I don't think it
>      is true that this is
>      > > > > > > > broken since there is also one SPI controller that has a
>      working implementation
>      > > > > > > > of m25p80's full API also when transfering through a tx
>      fifo (use case 1). But
>      > > > > > > > as mentioned above, by doing a minor extension and
>      improvement to m25p80's API
>      > > > > > > > and allow for toggling the accuracy from dummy clock
>      cycles to dummy bytes [1]
>      > > > > > > > will still be honored as in the same time making it
>      possible to have full
>      > > > > > > > support for the API in the SPI controllers that currently
>      do not (please reread
>      > > > > > > > the proposal in my previous reply that attempts to do
>      this). I myself see this
>      > > > > > > > as win/win situation, also because no controller should
>      need modifications.
>      > > > > > > >
>      > > > > > >
>      > > > > > > I am afraid your proposal does not work. Your proposed new
>      device
>      > > > > > > property 'model_dummy_bytes' to select to convert the
>      accurate dummy
>      > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to
>      justify as
>      > > > > > > a property to the flash itself, as the behavior is tightly
>      coupled to
>      > > > > > > how the SPI controller works.
>      > > > > >
>      > > > > > I agree on above. I decided though that instead of posting
>      sample code in here
>      > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc
>      you. About below,
>      > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first
>      step.
>      > > > > >
>      > > > >
>      > > > > Wait, (see below)
>      > > > >
>      > > > > > >
>      > > > > > > Please take a look at the Xilinx GQSPI controller, which
>      supports both
>      > > > > > > use cases, that the dummy cycles can be transferred via tx
>      fifo, or
>      > > > > > > generated by the controller automatically. Please read the
>      example
>      > > > > > > given in:
>      > > > > > >
>      > > > > > >     table 24‐22, an example of Generic FIFO Contents for
>      Quad I/O Read
>      > > > > > > Command (EBh)
>      > > > > > >
>      > > > > > > in
>      https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>      > > > > > >
>      > > > > > > If you choose to set the m25p80 device property
>      'model_dummy_bytes' to
>      > > > > > > true when working with the Xilinx GQSPI controller, you are
>      bound to
>      > > > > > > only allow guest software to use tx fifo to transfer the
>      dummy cycles,
>      > > > > > > and this is wrong.
>      > > > > > >
>      > > > >
>      > > > > You missed this part. I looked at your RFC, and as I mentioned
>      above
>      > > > > your proposal cannot support the complicated controller like
>      Xilinx
>      > > > > GQSPI. Please read the example of table 24-22. With your RFC,
>      you
>      > > > > mandate guest software's GQSPI driver to only use hardware dummy
>      cycle
>      > > > > generation, which is wrong.
>      > > > >
>      > > >
>      > > > First, thank you very much for looking into the RFC series, very
>      much
>      > > > appreciated. Secondly, about above, the GQSPI model in QEMU
>      transfers from 2
>      > > > locations in the file, in 1 location the transfer referred to
>      above is done, in
>      > > > another location the transfer through the txfifo is done. The
>      location where
>      > > > transfer referred to above is done will not need any modifications
>      (and will
>      > > > thus work equally well as it does currently).
>      > >
>      > > Please explain this a little bit. How does your RFC series handle
>      > > cases as described in table 24-22, where the 6 dummy cycles are
>      split
>      > > into 2 transfers, with one transfer using tx fifo, and the other one
>      > > using hardware dummy cycle generation?
>      >
>      > Sorry, I missunderstod. You are right, that won't work.
> 
>      +Edgar E. Iglesias
> 
>      So it looks by far the only way to implement dummy cycles correctly to
>      work with all SPI controller models is what I proposed here in this
>      patch series.
> 
>      Maintainers are quite silent, so I would like to hear your thoughts.
> 
>      @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>      please share your thoughts since you are the one who reviewed the
>      existing dummy implementation (based on commits history)
> 
>    Francisco really knows this stuff better than me....
>    I would tend to agree that it's unfortunate to model things in cycles, if
>    we could abstract things at a higher level that would be nice. Without
>    breaking existing use-cases.
>    Francisco, is it impossible to bring up the abstraction level to bytes and
>    keep existing use-cases?

Great question, I'm leaning on that it shouldn't be impossible to be
honest (but I haven't been able to try anything yet though).

Best regards,
Francisco Iglesias


>    We have a bunch of test-cases, We'll publish some of them in source code,
>    others we can't publish since they use proprietary SW we're not allowed to
>    publish at all, but we can run tests and Ack if things work.
>    Best regards,
>    Edgar


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-02-08 14:41                       ` Bin Meng
  2021-02-08 15:30                         ` Edgar E. Iglesias
@ 2021-04-23  6:45                         ` Bin Meng
  2021-04-27  5:56                           ` Alistair Francis
  1 sibling, 1 reply; 36+ messages in thread
From: Bin Meng @ 2021-04-23  6:45 UTC (permalink / raw)
  To: Francisco Iglesias, Edgar E. Iglesias, Alistair Francis
  Cc: Kevin Wolf, Peter Maydell, qemu-devel@nongnu.org Developers,
	Qemu-block, Andrew Jeffery, Bin Meng, Philippe Mathieu-Daudé,
	Havard Skinnemoen, Tyrone Ting, qemu-arm, Cédric Le Goater,
	Joe Komlodi, Max Reitz, Joel Stanley

On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>
> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> <frasse.iglesias@gmail.com> wrote:
> >
> > Hi Bin,
> >
> > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > Hi Francisco,
> > >
> > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Dear Bin,
> > > >
> > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Hi Bin,
> > > > > >
> > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > Hi Francisco,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > >
> > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > >
> > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > >
> > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > >
> > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > > >
> > > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > > time ago!
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > > >
> > > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > > >
> > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > > >
> > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > > Luckily it also easier in this case, see below.
> > > > > > >
> > > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > > about model behavior changes, sure I can update
> > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > >
> > > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > > in tree for while, users have found known or unknown features that got
> > > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > > proposed).
> > > > > >
> > > > >
> > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > implementation forever? I think that's why
> > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > feature.
> > > >
> > > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > > Because of this, I, with all respect, will have to disagree that this is buggy.
> > >
> > > Well, the existing m25p80 implementation that uses dummy cycle
> > > accuracy for those flashes prevents all SPI controllers that use tx
> > > fifo to work with those flashes. Hence it is buggy.
> > >
> > > >
> > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > > >
> > > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > > other type.
> > > > > > > >
> > > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > > >
> > > > > > >
> > > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > > how the SPI controller works.
> > > > > >
> > > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > > >
> > > > >
> > > > > Wait, (see below)
> > > > >
> > > > > > >
> > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > > generated by the controller automatically. Please read the example
> > > > > > > given in:
> > > > > > >
> > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > > Command (EBh)
> > > > > > >
> > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > >
> > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > > and this is wrong.
> > > > > > >
> > > > >
> > > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > > generation, which is wrong.
> > > > >
> > > >
> > > > First, thank you very much for looking into the RFC series, very much
> > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > > locations in the file, in 1 location the transfer referred to above is done, in
> > > > another location the transfer through the txfifo is done. The location where
> > > > transfer referred to above is done will not need any modifications (and will
> > > > thus work equally well as it does currently).
> > >
> > > Please explain this a little bit. How does your RFC series handle
> > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > using hardware dummy cycle generation?
> >
> > Sorry, I missunderstod. You are right, that won't work.
>
> +Edgar E. Iglesias
>
> So it looks by far the only way to implement dummy cycles correctly to
> work with all SPI controller models is what I proposed here in this
> patch series.
>
> Maintainers are quite silent, so I would like to hear your thoughts.
>
> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> please share your thoughts since you are the one who reviewed the
> existing dummy implementation (based on commits history)

Hello maintainers,

We apparently missed the 6.0 window to address this mess of the m25p80
model. Please provide your inputs on this before I start working on
the v2.

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-04-23  6:45                         ` Bin Meng
@ 2021-04-27  5:56                           ` Alistair Francis
  2021-04-27  8:54                             ` Francisco Iglesias
  0 siblings, 1 reply; 36+ messages in thread
From: Alistair Francis @ 2021-04-27  5:56 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, Qemu-block, Andrew Jeffery,
	Francisco Iglesias, Bin Meng, qemu-devel@nongnu.org Developers,
	Philippe Mathieu-Daudé,
	Tyrone Ting, qemu-arm, Alistair Francis, Cédric Le Goater,
	Joe Komlodi, Edgar E. Iglesias, Havard Skinnemoen, Max Reitz,
	Joel Stanley

On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>
> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >
> > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> > <frasse.iglesias@gmail.com> wrote:
> > >
> > > Hi Bin,
> > >
> > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > > Hi Francisco,
> > > >
> > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > > <frasse.iglesias@gmail.com> wrote:
> > > > >
> > > > > Dear Bin,
> > > > >
> > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > > Hi Francisco,
> > > > > >
> > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi Bin,
> > > > > > >
> > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > > Hi Francisco,
> > > > > > > >
> > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi Bin,
> > > > > > > > >
> > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > > Hi Francisco,
> > > > > > > > > >
> > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi Bin,
> > > > > > > > > > >
> > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > > Hi Francisco,
> > > > > > > > > > > >
> > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > > > >
> > > > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > > > time ago!
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > > > >
> > > > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > > > >
> > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > > > >
> > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > > > Luckily it also easier in this case, see below.
> > > > > > > >
> > > > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > > > about model behavior changes, sure I can update
> > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > > >
> > > > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > > > in tree for while, users have found known or unknown features that got
> > > > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > > > proposed).
> > > > > > >
> > > > > >
> > > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > > implementation forever? I think that's why
> > > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > > feature.
> > > > >
> > > > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > > > Because of this, I, with all respect, will have to disagree that this is buggy.
> > > >
> > > > Well, the existing m25p80 implementation that uses dummy cycle
> > > > accuracy for those flashes prevents all SPI controllers that use tx
> > > > fifo to work with those flashes. Hence it is buggy.
> > > >
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > > > >
> > > > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > > > other type.
> > > > > > > > >
> > > > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > > > >
> > > > > > > >
> > > > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > > > how the SPI controller works.
> > > > > > >
> > > > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > > > >
> > > > > >
> > > > > > Wait, (see below)
> > > > > >
> > > > > > > >
> > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > > > generated by the controller automatically. Please read the example
> > > > > > > > given in:
> > > > > > > >
> > > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > > > Command (EBh)
> > > > > > > >
> > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > > >
> > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > > > and this is wrong.
> > > > > > > >
> > > > > >
> > > > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > > > generation, which is wrong.
> > > > > >
> > > > >
> > > > > First, thank you very much for looking into the RFC series, very much
> > > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > > > locations in the file, in 1 location the transfer referred to above is done, in
> > > > > another location the transfer through the txfifo is done. The location where
> > > > > transfer referred to above is done will not need any modifications (and will
> > > > > thus work equally well as it does currently).
> > > >
> > > > Please explain this a little bit. How does your RFC series handle
> > > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > > using hardware dummy cycle generation?
> > >
> > > Sorry, I missunderstod. You are right, that won't work.
> >
> > +Edgar E. Iglesias
> >
> > So it looks by far the only way to implement dummy cycles correctly to
> > work with all SPI controller models is what I proposed here in this
> > patch series.
> >
> > Maintainers are quite silent, so I would like to hear your thoughts.
> >
> > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> > please share your thoughts since you are the one who reviewed the
> > existing dummy implementation (based on commits history)

I agree with Edgar, in that Francisco and Bin know this better than me
and that modelling things in cycles is a pain.

As Bin points out it seems like currently we should be modelling bytes
(from the variable name) so it makes sense to keep it in bytes. I
would be in favour of this series in that case. Do we know what use
cases this will break? I know it's hard to answer but I don't think
there are too many SSI users in QEMU so it might not be too hard to
test most of the possible use cases.

Alistair

>
> Hello maintainers,
>
> We apparently missed the 6.0 window to address this mess of the m25p80
> model. Please provide your inputs on this before I start working on
> the v2.
>
> Regards,
> Bin
>


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-04-27  5:56                           ` Alistair Francis
@ 2021-04-27  8:54                             ` Francisco Iglesias
  2021-04-27 14:32                               ` Cédric Le Goater
  0 siblings, 1 reply; 36+ messages in thread
From: Francisco Iglesias @ 2021-04-27  8:54 UTC (permalink / raw)
  To: Alistair Francis
  Cc: Kevin Wolf, Peter Maydell, Qemu-block, Andrew Jeffery, Bin Meng,
	qemu-devel@nongnu.org Developers, Philippe Mathieu-Daudé,
	Tyrone Ting, qemu-arm, Alistair Francis, Cédric Le Goater,
	Joe Komlodi, Edgar E. Iglesias, Havard Skinnemoen, Bin Meng,
	Max Reitz, Joel Stanley

On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >
> > On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> > >
> > > On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> > > <frasse.iglesias@gmail.com> wrote:
> > > >
> > > > Hi Bin,
> > > >
> > > > On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> > > > > Hi Francisco,
> > > > >
> > > > > On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > >
> > > > > > Dear Bin,
> > > > > >
> > > > > > On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> > > > > > > Hi Francisco,
> > > > > > >
> > > > > > > On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi Bin,
> > > > > > > >
> > > > > > > > On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> > > > > > > > > Hi Francisco,
> > > > > > > > >
> > > > > > > > > On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi Bin,
> > > > > > > > > >
> > > > > > > > > > On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> > > > > > > > > > > Hi Francisco,
> > > > > > > > > > >
> > > > > > > > > > > On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > >
> > > > > > > > > > > > On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> > > > > > > > > > > > > Hi Francisco,
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> > > > > > > > > > > > > <frasse.iglesias@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Bin,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> > > > > > > > > > > > > > > From: Bin Meng <bin.meng@windriver.com>
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > The m25p80 model uses s->needed_bytes to indicate how many follow-up
> > > > > > > > > > > > > > > bytes are expected to be received after it receives a command. For
> > > > > > > > > > > > > > > example, depending on the address mode, either 3-byte address or
> > > > > > > > > > > > > > > 4-byte address is needed.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > For fast read family commands, some dummy cycles are required after
> > > > > > > > > > > > > > > sending the address bytes, and the dummy cycles need to be counted
> > > > > > > > > > > > > > > in s->needed_bytes. This is where the mess began.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > As the variable name (needed_bytes) indicates, the unit is in byte.
> > > > > > > > > > > > > > > It is not in bit, or cycle. However for some reason the model has
> > > > > > > > > > > > > > > been using the number of dummy cycles for s->needed_bytes. The right
> > > > > > > > > > > > > > > approach is to convert the number of dummy cycles to bytes based on
> > > > > > > > > > > > > > > the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> > > > > > > > > > > > > > > I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > While not being the original implementor I must assume that above solution was
> > > > > > > > > > > > > > considered but not chosen by the developers due to it is inaccuracy (it
> > > > > > > > > > > > > > wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> > > > > > > > > > > > > > meaning that if the controller is wrongly programmed to generate 7 the error
> > > > > > > > > > > > > > wouldn't be caught and the controller will still be considered "correct"). Now
> > > > > > > > > > > > > > that we have this detail in the implementation I'm in favor of keeping it, this
> > > > > > > > > > > > > > also because the detail is already in use for catching exactly above error.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > I found no clue from the commit message that my proposed solution here
> > > > > > > > > > > > > was ever considered, otherwise all SPI controller models supporting
> > > > > > > > > > > > > software generation should have been found out seriously broken long
> > > > > > > > > > > > > time ago!
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > The controllers you are referring to might lack support for commands requiring
> > > > > > > > > > > > dummy clock cycles but I really hope they work with the other commands? If so I
> > > > > > > > > > >
> > > > > > > > > > > I am not sure why you view dummy clock cycles as something special
> > > > > > > > > > > that needs some special support from the SPI controller. For the case
> > > > > > > > > > > 1 controller, it's nothing special from the controller perspective,
> > > > > > > > > > > just like sending out a command, or address bytes, or data. The
> > > > > > > > > > > controller just shifts data bit by bit from its tx fifo and that's it.
> > > > > > > > > > > In the Xilinx GQSPI controller case, the dummy cycles can either be
> > > > > > > > > > > sent via a regular data (the case 1 controller) in the tx fifo, or
> > > > > > > > > > > automatically generated (case 2 controller) by the hardware.
> > > > > > > > > >
> > > > > > > > > > Ok, I'll try to explain my view point a little differently. For that we also
> > > > > > > > > > need to keep in mind that QEMU models HW, and any binary that runs on a HW
> > > > > > > > > > board supported in QEMU should ideally run on that board inside QEMU aswell
> > > > > > > > > > (this can be a bare metal application equaly well as a modified u-boot/Linux
> > > > > > > > > > using SPI commands with a non multiple of 8 number of dummy clock cycles).
> > > > > > > > > >
> > > > > > > > > > Once functionality has been introduced into QEMU it is not easy to know which
> > > > > > > > > > intentional or untentional features provided by the functionality are being
> > > > > > > > > > used by users. One of the (perhaps not well known) features I'm aware of that
> > > > > > > > > > is in use and is provided by the accurate dummy clock cycle modeling inside
> > > > > > > > > > m25p80 is the be ability to test drivers accurately regarding the dummy clock
> > > > > > > > > > cycles (even when using commands with a non-multiple of 8 number of dummy clock
> > > > > > > > > > cycles), but there might be others aswell. So by removing this functionality
> > > > > > > > > > above use case will brake, this since those test will not be reliable.
> > > > > > > > > > Furthermore, since users tend to be creative it is not possible to know if
> > > > > > > > > > there are other use cases that will be affected. This means that in case [1]
> > > > > > > > > > needs to be followed the safe path is to add functionality instead of removing.
> > > > > > > > > > Luckily it also easier in this case, see below.
> > > > > > > > >
> > > > > > > > > I understand there might be users other than U-Boot/Linux that use an
> > > > > > > > > odd number of dummy bits (not multiple of 8). If your concern was
> > > > > > > > > about model behavior changes, sure I can update
> > > > > > > > > qemu/docs/system/deprecated.rst to mention that some flashes in the
> > > > > > > > > m25p80 model now implement dummy cycles as bytes.
> > > > > > > >
> > > > > > > > Yes, something like that. My concern is that since this functionality has been
> > > > > > > > in tree for while, users have found known or unknown features that got
> > > > > > > > introduced by it. By removing the functionality (and the known/uknown features)
> > > > > > > > we are riscing to brake our user's use cases (currently I'm aware of one
> > > > > > > > feature/use case but it is not unlikely that there are more). [1] states that
> > > > > > > > "In general features are intended to be supported indefinitely once introduced
> > > > > > > > into QEMU", to me that makes very much sense because the opposite would mean
> > > > > > > > that we were not reliable. So in case [1] needs to be honored it looks to be
> > > > > > > > safer to add functionality instead of removing (and riscing the removal of use
> > > > > > > > cases/features). Luckily I still believe in this case that it will be easier to
> > > > > > > > go forward (even if I also agree on what you are saying below about what I
> > > > > > > > proposed).
> > > > > > > >
> > > > > > >
> > > > > > > Even if the implementation is buggy and we need to keep the buggy
> > > > > > > implementation forever? I think that's why
> > > > > > > qemu/docs/system/deprecated.rst was created for deprecating such
> > > > > > > feature.
> > > > > >
> > > > > > With the RFC I posted all commands in m25p80 are working for both the case 1
> > > > > > controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> > > > > > Because of this, I, with all respect, will have to disagree that this is buggy.
> > > > >
> > > > > Well, the existing m25p80 implementation that uses dummy cycle
> > > > > accuracy for those flashes prevents all SPI controllers that use tx
> > > > > fifo to work with those flashes. Hence it is buggy.
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > don't think it is fair to call them 'seriously broken' (and else we should
> > > > > > > > > > > > probably let the maintainers know about it). Most likely the lack of support
> > > > > > > > > > >
> > > > > > > > > > > I called it "seriously broken" because current implementation only
> > > > > > > > > > > considered one type of SPI controllers while completely ignoring the
> > > > > > > > > > > other type.
> > > > > > > > > >
> > > > > > > > > > If we change view and see this from the perspective of m25p80, it models the
> > > > > > > > > > commands a certain way and provides an API that the SPI controllers need to
> > > > > > > > > > implement for interacting with it. It is true that there are SPI controllers
> > > > > > > > > > referred to above that do not support the portion of that API that corresponds
> > > > > > > > > > to commands with dummy clock cycles, but I don't think it is true that this is
> > > > > > > > > > broken since there is also one SPI controller that has a working implementation
> > > > > > > > > > of m25p80's full API also when transfering through a tx fifo (use case 1). But
> > > > > > > > > > as mentioned above, by doing a minor extension and improvement to m25p80's API
> > > > > > > > > > and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> > > > > > > > > > will still be honored as in the same time making it possible to have full
> > > > > > > > > > support for the API in the SPI controllers that currently do not (please reread
> > > > > > > > > > the proposal in my previous reply that attempts to do this). I myself see this
> > > > > > > > > > as win/win situation, also because no controller should need modifications.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > I am afraid your proposal does not work. Your proposed new device
> > > > > > > > > property 'model_dummy_bytes' to select to convert the accurate dummy
> > > > > > > > > clock cycle count to dummy bytes inside m25p80, is hard to justify as
> > > > > > > > > a property to the flash itself, as the behavior is tightly coupled to
> > > > > > > > > how the SPI controller works.
> > > > > > > >
> > > > > > > > I agree on above. I decided though that instead of posting sample code in here
> > > > > > > > I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> > > > > > > > Xilinx ZynqMP GQSPI should not need any modication in a first step.
> > > > > > > >
> > > > > > >
> > > > > > > Wait, (see below)
> > > > > > >
> > > > > > > > >
> > > > > > > > > Please take a look at the Xilinx GQSPI controller, which supports both
> > > > > > > > > use cases, that the dummy cycles can be transferred via tx fifo, or
> > > > > > > > > generated by the controller automatically. Please read the example
> > > > > > > > > given in:
> > > > > > > > >
> > > > > > > > >     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> > > > > > > > > Command (EBh)
> > > > > > > > >
> > > > > > > > > in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> > > > > > > > >
> > > > > > > > > If you choose to set the m25p80 device property 'model_dummy_bytes' to
> > > > > > > > > true when working with the Xilinx GQSPI controller, you are bound to
> > > > > > > > > only allow guest software to use tx fifo to transfer the dummy cycles,
> > > > > > > > > and this is wrong.
> > > > > > > > >
> > > > > > >
> > > > > > > You missed this part. I looked at your RFC, and as I mentioned above
> > > > > > > your proposal cannot support the complicated controller like Xilinx
> > > > > > > GQSPI. Please read the example of table 24-22. With your RFC, you
> > > > > > > mandate guest software's GQSPI driver to only use hardware dummy cycle
> > > > > > > generation, which is wrong.
> > > > > > >
> > > > > >
> > > > > > First, thank you very much for looking into the RFC series, very much
> > > > > > appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> > > > > > locations in the file, in 1 location the transfer referred to above is done, in
> > > > > > another location the transfer through the txfifo is done. The location where
> > > > > > transfer referred to above is done will not need any modifications (and will
> > > > > > thus work equally well as it does currently).
> > > > >
> > > > > Please explain this a little bit. How does your RFC series handle
> > > > > cases as described in table 24-22, where the 6 dummy cycles are split
> > > > > into 2 transfers, with one transfer using tx fifo, and the other one
> > > > > using hardware dummy cycle generation?
> > > >
> > > > Sorry, I missunderstod. You are right, that won't work.
> > >
> > > +Edgar E. Iglesias
> > >
> > > So it looks by far the only way to implement dummy cycles correctly to
> > > work with all SPI controller models is what I proposed here in this
> > > patch series.
> > >
> > > Maintainers are quite silent, so I would like to hear your thoughts.
> > >
> > > @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> > > please share your thoughts since you are the one who reviewed the
> > > existing dummy implementation (based on commits history)
> 
> I agree with Edgar, in that Francisco and Bin know this better than me
> and that modelling things in cycles is a pain.

Hi Alistair,

> 
> As Bin points out it seems like currently we should be modelling bytes
> (from the variable name) so it makes sense to keep it in bytes. I
> would be in favour of this series in that case. Do we know what use
> cases this will break? I know it's hard to answer but I don't think
> there are too many SSI users in QEMU so it might not be too hard to
> test most of the possible use cases.

The use case I'm aware of is regression testing of drivers. Ex: if a
driver is using 10 dummy clock cycles with the commands and a patch
accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
finds the problem, that won't be possible with this series. It's difficult
to say but it is not impossible there are other use cases also.

More importantly IMO though is that the current use cases can be keept
while still providing support for commands with dummy clock cycles into
the QEMU SPI controllers lacking at the moment.

(If I recall correctly this series might also have another issue regarding
the GQSPI SPI mode configuration, with that it is possible transmit 8
dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I
think some form of calculation might be needed inside m25p80).

Best regards,
Francisco


> 
> Alistair
> 
> >
> > Hello maintainers,
> >
> > We apparently missed the 6.0 window to address this mess of the m25p80
> > model. Please provide your inputs on this before I start working on
> > the v2.
> >
> > Regards,
> > Bin
> >


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-04-27  8:54                             ` Francisco Iglesias
@ 2021-04-27 14:32                               ` Cédric Le Goater
  2021-04-28 13:12                                 ` Bin Meng
  0 siblings, 1 reply; 36+ messages in thread
From: Cédric Le Goater @ 2021-04-27 14:32 UTC (permalink / raw)
  To: Francisco Iglesias, Alistair Francis
  Cc: Kevin Wolf, Peter Maydell, Qemu-block, Andrew Jeffery, Bin Meng,
	qemu-devel@nongnu.org Developers, Philippe Mathieu-Daudé,
	Tyrone Ting, qemu-arm, Alistair Francis, Joel Stanley,
	Joe Komlodi, Edgar E. Iglesias, Havard Skinnemoen, Bin Meng,
	Max Reitz

Hello,

On 4/27/21 10:54 AM, Francisco Iglesias wrote:
> On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
>> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>
>>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>
>>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>
>>>>> Hi Bin,
>>>>>
>>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>>>>>> Hi Francisco,
>>>>>>
>>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>
>>>>>>> Dear Bin,
>>>>>>>
>>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>>>>>>>> Hi Francisco,
>>>>>>>>
>>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Bin,
>>>>>>>>>
>>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>>>>>>>>>> Hi Francisco,
>>>>>>>>>>
>>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>
>>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up
>>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For
>>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or
>>>>>>>>>>>>>>>> 4-byte address is needed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after
>>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted
>>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte.
>>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has
>>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right
>>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on
>>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
>>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was
>>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it
>>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
>>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error
>>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now
>>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this
>>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here
>>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting
>>>>>>>>>>>>>> software generation should have been found out seriously broken long
>>>>>>>>>>>>>> time ago!
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring
>>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I
>>>>>>>>>>>>
>>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special
>>>>>>>>>>>> that needs some special support from the SPI controller. For the case
>>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective,
>>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
>>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it.
>>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be
>>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or
>>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
>>>>>>>>>>>
>>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also
>>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW
>>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell
>>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux
>>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles).
>>>>>>>>>>>
>>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which
>>>>>>>>>>> intentional or untentional features provided by the functionality are being
>>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that
>>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside
>>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock
>>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock
>>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality
>>>>>>>>>>> above use case will brake, this since those test will not be reliable.
>>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if
>>>>>>>>>>> there are other use cases that will be affected. This means that in case [1]
>>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing.
>>>>>>>>>>> Luckily it also easier in this case, see below.
>>>>>>>>>>
>>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an
>>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
>>>>>>>>>> about model behavior changes, sure I can update
>>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
>>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
>>>>>>>>>
>>>>>>>>> Yes, something like that. My concern is that since this functionality has been
>>>>>>>>> in tree for while, users have found known or unknown features that got
>>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features)
>>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one
>>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that
>>>>>>>>> "In general features are intended to be supported indefinitely once introduced
>>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean
>>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be
>>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use
>>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to
>>>>>>>>> go forward (even if I also agree on what you are saying below about what I
>>>>>>>>> proposed).
>>>>>>>>>
>>>>>>>>
>>>>>>>> Even if the implementation is buggy and we need to keep the buggy
>>>>>>>> implementation forever? I think that's why
>>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
>>>>>>>> feature.
>>>>>>>
>>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1
>>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
>>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy.
>>>>>>
>>>>>> Well, the existing m25p80 implementation that uses dummy cycle
>>>>>> accuracy for those flashes prevents all SPI controllers that use tx
>>>>>> fifo to work with those flashes. Hence it is buggy.
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should
>>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support
>>>>>>>>>>>>
>>>>>>>>>>>> I called it "seriously broken" because current implementation only
>>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the
>>>>>>>>>>>> other type.
>>>>>>>>>>>
>>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the
>>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to
>>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers
>>>>>>>>>>> referred to above that do not support the portion of that API that corresponds
>>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is
>>>>>>>>>>> broken since there is also one SPI controller that has a working implementation
>>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But
>>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API
>>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
>>>>>>>>>>> will still be honored as in the same time making it possible to have full
>>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread
>>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this
>>>>>>>>>>> as win/win situation, also because no controller should need modifications.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
>>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy
>>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as
>>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to
>>>>>>>>>> how the SPI controller works.
>>>>>>>>>
>>>>>>>>> I agree on above. I decided though that instead of posting sample code in here
>>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
>>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Wait, (see below)
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both
>>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
>>>>>>>>>> generated by the controller automatically. Please read the example
>>>>>>>>>> given in:
>>>>>>>>>>
>>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
>>>>>>>>>> Command (EBh)
>>>>>>>>>>
>>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>>>>>>>>>>
>>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to
>>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to
>>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles,
>>>>>>>>>> and this is wrong.
>>>>>>>>>>
>>>>>>>>
>>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
>>>>>>>> your proposal cannot support the complicated controller like Xilinx
>>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
>>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle
>>>>>>>> generation, which is wrong.
>>>>>>>>
>>>>>>>
>>>>>>> First, thank you very much for looking into the RFC series, very much
>>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
>>>>>>> locations in the file, in 1 location the transfer referred to above is done, in
>>>>>>> another location the transfer through the txfifo is done. The location where
>>>>>>> transfer referred to above is done will not need any modifications (and will
>>>>>>> thus work equally well as it does currently).
>>>>>>
>>>>>> Please explain this a little bit. How does your RFC series handle
>>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
>>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
>>>>>> using hardware dummy cycle generation?
>>>>>
>>>>> Sorry, I missunderstod. You are right, that won't work.
>>>>
>>>> +Edgar E. Iglesias
>>>>
>>>> So it looks by far the only way to implement dummy cycles correctly to
>>>> work with all SPI controller models is what I proposed here in this
>>>> patch series.
>>>>
>>>> Maintainers are quite silent, so I would like to hear your thoughts.
>>>>
>>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>>>> please share your thoughts since you are the one who reviewed the
>>>> existing dummy implementation (based on commits history)
>>
>> I agree with Edgar, in that Francisco and Bin know this better than me
>> and that modelling things in cycles is a pain.
> 
> Hi Alistair,
> 
>>
>> As Bin points out it seems like currently we should be modelling bytes
>> (from the variable name) so it makes sense to keep it in bytes. I
>> would be in favour of this series in that case. Do we know what use
>> cases this will break? I know it's hard to answer but I don't think
>> there are too many SSI users in QEMU so it might not be too hard to
>> test most of the possible use cases.
> 
> The use case I'm aware of is regression testing of drivers. Ex: if a
> driver is using 10 dummy clock cycles with the commands and a patch
> accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
> finds the problem, that won't be possible with this series. It's difficult
> to say but it is not impossible there are other use cases also.


It was breaking the Aspeed machines :

  https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/

QEMU 6.1 should have acceptance tests that will help in detecting
regressions in this area.

Thanks,

C.
 



> 
> More importantly IMO though is that the current use cases can be keept
> while still providing support for commands with dummy clock cycles into
> the QEMU SPI controllers lacking at the moment.
> 
> (If I recall correctly this series might also have another issue regarding
> the GQSPI SPI mode configuration, with that it is possible transmit 8
> dummy clock cycles as 1 data byte, 2 data bytes or 4 data bytes, so I
> think some form of calculation might be needed inside m25p80).
> 
> Best regards,
> Francisco
> 
> 
>>
>> Alistair
>>
>>>
>>> Hello maintainers,
>>>
>>> We apparently missed the 6.0 window to address this mess of the m25p80
>>> model. Please provide your inputs on this before I start working on
>>> the v2.
>>>
>>> Regards,
>>> Bin
>>>



^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-04-27 14:32                               ` Cédric Le Goater
@ 2021-04-28 13:12                                 ` Bin Meng
  2021-04-28 13:54                                   ` Cédric Le Goater
  0 siblings, 1 reply; 36+ messages in thread
From: Bin Meng @ 2021-04-28 13:12 UTC (permalink / raw)
  To: Cédric Le Goater
  Cc: Kevin Wolf, Peter Maydell, Qemu-block, Andrew Jeffery,
	Francisco Iglesias, Bin Meng, qemu-devel@nongnu.org Developers,
	Philippe Mathieu-Daudé,
	Tyrone Ting, qemu-arm, Alistair Francis, Joel Stanley,
	Joe Komlodi, Alistair Francis, Edgar E. Iglesias,
	Havard Skinnemoen, Max Reitz

Hi Cédric,

On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote:
>
> Hello,
>
> On 4/27/21 10:54 AM, Francisco Iglesias wrote:
> > On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
> >> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >>>
> >>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
> >>>>
> >>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
> >>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>
> >>>>> Hi Bin,
> >>>>>
> >>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
> >>>>>> Hi Francisco,
> >>>>>>
> >>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
> >>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>
> >>>>>>> Dear Bin,
> >>>>>>>
> >>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
> >>>>>>>> Hi Francisco,
> >>>>>>>>
> >>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
> >>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>
> >>>>>>>>> Hi Bin,
> >>>>>>>>>
> >>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
> >>>>>>>>>> Hi Francisco,
> >>>>>>>>>>
> >>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
> >>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>
> >>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
> >>>>>>>>>>>> Hi Francisco,
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
> >>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
> >>>>>>>>>>>>>> Hi Francisco,
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
> >>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> Hi Bin,
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
> >>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up
> >>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For
> >>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or
> >>>>>>>>>>>>>>>> 4-byte address is needed.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after
> >>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted
> >>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
> >>>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte.
> >>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has
> >>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right
> >>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on
> >>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
> >>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was
> >>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it
> >>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
> >>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error
> >>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now
> >>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this
> >>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error.
> >>>>>>>>>>>>>>>
> >>>>>>>>>>>>>>
> >>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here
> >>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting
> >>>>>>>>>>>>>> software generation should have been found out seriously broken long
> >>>>>>>>>>>>>> time ago!
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring
> >>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I
> >>>>>>>>>>>>
> >>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special
> >>>>>>>>>>>> that needs some special support from the SPI controller. For the case
> >>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective,
> >>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
> >>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it.
> >>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be
> >>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or
> >>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
> >>>>>>>>>>>
> >>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also
> >>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW
> >>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell
> >>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux
> >>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles).
> >>>>>>>>>>>
> >>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which
> >>>>>>>>>>> intentional or untentional features provided by the functionality are being
> >>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that
> >>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside
> >>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock
> >>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock
> >>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality
> >>>>>>>>>>> above use case will brake, this since those test will not be reliable.
> >>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if
> >>>>>>>>>>> there are other use cases that will be affected. This means that in case [1]
> >>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing.
> >>>>>>>>>>> Luckily it also easier in this case, see below.
> >>>>>>>>>>
> >>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an
> >>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
> >>>>>>>>>> about model behavior changes, sure I can update
> >>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
> >>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
> >>>>>>>>>
> >>>>>>>>> Yes, something like that. My concern is that since this functionality has been
> >>>>>>>>> in tree for while, users have found known or unknown features that got
> >>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features)
> >>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one
> >>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that
> >>>>>>>>> "In general features are intended to be supported indefinitely once introduced
> >>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean
> >>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be
> >>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use
> >>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to
> >>>>>>>>> go forward (even if I also agree on what you are saying below about what I
> >>>>>>>>> proposed).
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Even if the implementation is buggy and we need to keep the buggy
> >>>>>>>> implementation forever? I think that's why
> >>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
> >>>>>>>> feature.
> >>>>>>>
> >>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1
> >>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
> >>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy.
> >>>>>>
> >>>>>> Well, the existing m25p80 implementation that uses dummy cycle
> >>>>>> accuracy for those flashes prevents all SPI controllers that use tx
> >>>>>> fifo to work with those flashes. Hence it is buggy.
> >>>>>>
> >>>>>>>
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should
> >>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support
> >>>>>>>>>>>>
> >>>>>>>>>>>> I called it "seriously broken" because current implementation only
> >>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the
> >>>>>>>>>>>> other type.
> >>>>>>>>>>>
> >>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the
> >>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to
> >>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers
> >>>>>>>>>>> referred to above that do not support the portion of that API that corresponds
> >>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is
> >>>>>>>>>>> broken since there is also one SPI controller that has a working implementation
> >>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But
> >>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API
> >>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
> >>>>>>>>>>> will still be honored as in the same time making it possible to have full
> >>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread
> >>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this
> >>>>>>>>>>> as win/win situation, also because no controller should need modifications.
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
> >>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy
> >>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as
> >>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to
> >>>>>>>>>> how the SPI controller works.
> >>>>>>>>>
> >>>>>>>>> I agree on above. I decided though that instead of posting sample code in here
> >>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
> >>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Wait, (see below)
> >>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both
> >>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
> >>>>>>>>>> generated by the controller automatically. Please read the example
> >>>>>>>>>> given in:
> >>>>>>>>>>
> >>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
> >>>>>>>>>> Command (EBh)
> >>>>>>>>>>
> >>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
> >>>>>>>>>>
> >>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to
> >>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to
> >>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles,
> >>>>>>>>>> and this is wrong.
> >>>>>>>>>>
> >>>>>>>>
> >>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
> >>>>>>>> your proposal cannot support the complicated controller like Xilinx
> >>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
> >>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle
> >>>>>>>> generation, which is wrong.
> >>>>>>>>
> >>>>>>>
> >>>>>>> First, thank you very much for looking into the RFC series, very much
> >>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
> >>>>>>> locations in the file, in 1 location the transfer referred to above is done, in
> >>>>>>> another location the transfer through the txfifo is done. The location where
> >>>>>>> transfer referred to above is done will not need any modifications (and will
> >>>>>>> thus work equally well as it does currently).
> >>>>>>
> >>>>>> Please explain this a little bit. How does your RFC series handle
> >>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
> >>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
> >>>>>> using hardware dummy cycle generation?
> >>>>>
> >>>>> Sorry, I missunderstod. You are right, that won't work.
> >>>>
> >>>> +Edgar E. Iglesias
> >>>>
> >>>> So it looks by far the only way to implement dummy cycles correctly to
> >>>> work with all SPI controller models is what I proposed here in this
> >>>> patch series.
> >>>>
> >>>> Maintainers are quite silent, so I would like to hear your thoughts.
> >>>>
> >>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
> >>>> please share your thoughts since you are the one who reviewed the
> >>>> existing dummy implementation (based on commits history)
> >>
> >> I agree with Edgar, in that Francisco and Bin know this better than me
> >> and that modelling things in cycles is a pain.
> >
> > Hi Alistair,
> >
> >>
> >> As Bin points out it seems like currently we should be modelling bytes
> >> (from the variable name) so it makes sense to keep it in bytes. I
> >> would be in favour of this series in that case. Do we know what use
> >> cases this will break? I know it's hard to answer but I don't think
> >> there are too many SSI users in QEMU so it might not be too hard to
> >> test most of the possible use cases.
> >
> > The use case I'm aware of is regression testing of drivers. Ex: if a
> > driver is using 10 dummy clock cycles with the commands and a patch
> > accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
> > finds the problem, that won't be possible with this series. It's difficult
> > to say but it is not impossible there are other use cases also.
>
>
> It was breaking the Aspeed machines :
>
>   https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/

Yes, as I mentioned in the series the modification was based on a pure
guess from existing QEMU codes as I could not find a datasheet of the
Aspeed SPI controller on the internet. Do you know if this is publicly
available?

>
> QEMU 6.1 should have acceptance tests that will help in detecting
> regressions in this area.
>

Regards,
Bin


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands
  2021-04-28 13:12                                 ` Bin Meng
@ 2021-04-28 13:54                                   ` Cédric Le Goater
  0 siblings, 0 replies; 36+ messages in thread
From: Cédric Le Goater @ 2021-04-28 13:54 UTC (permalink / raw)
  To: Bin Meng
  Cc: Kevin Wolf, Peter Maydell, Qemu-block, Andrew Jeffery,
	Francisco Iglesias, Bin Meng, qemu-devel@nongnu.org Developers,
	Philippe Mathieu-Daudé,
	Tyrone Ting, qemu-arm, Alistair Francis, Joel Stanley,
	Joe Komlodi, Alistair Francis, Edgar E. Iglesias,
	Havard Skinnemoen, Max Reitz

On 4/28/21 3:12 PM, Bin Meng wrote:
> Hi Cédric,
> 
> On Tue, Apr 27, 2021 at 10:32 PM Cédric Le Goater <clg@kaod.org> wrote:
>>
>> Hello,
>>
>> On 4/27/21 10:54 AM, Francisco Iglesias wrote:
>>> On [2021 Apr 27] Tue 15:56:10, Alistair Francis wrote:
>>>> On Fri, Apr 23, 2021 at 4:46 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>>
>>>>> On Mon, Feb 8, 2021 at 10:41 PM Bin Meng <bmeng.cn@gmail.com> wrote:
>>>>>>
>>>>>> On Thu, Jan 21, 2021 at 10:18 PM Francisco Iglesias
>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Bin,
>>>>>>>
>>>>>>> On [2021 Jan 21] Thu 16:59:51, Bin Meng wrote:
>>>>>>>> Hi Francisco,
>>>>>>>>
>>>>>>>> On Thu, Jan 21, 2021 at 4:50 PM Francisco Iglesias
>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Dear Bin,
>>>>>>>>>
>>>>>>>>> On [2021 Jan 20] Wed 22:20:25, Bin Meng wrote:
>>>>>>>>>> Hi Francisco,
>>>>>>>>>>
>>>>>>>>>> On Tue, Jan 19, 2021 at 9:01 PM Francisco Iglesias
>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>
>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>
>>>>>>>>>>> On [2021 Jan 18] Mon 20:32:19, Bin Meng wrote:
>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 18, 2021 at 6:06 PM Francisco Iglesias
>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>
>>>>>>>>>>>>> On [2021 Jan 15] Fri 22:38:18, Bin Meng wrote:
>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 8:26 PM Francisco Iglesias
>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On [2021 Jan 15] Fri 10:07:52, Bin Meng wrote:
>>>>>>>>>>>>>>>> Hi Francisco,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jan 15, 2021 at 2:13 AM Francisco Iglesias
>>>>>>>>>>>>>>>> <frasse.iglesias@gmail.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi Bin,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On [2021 Jan 14] Thu 23:08:53, Bin Meng wrote:
>>>>>>>>>>>>>>>>>> From: Bin Meng <bin.meng@windriver.com>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The m25p80 model uses s->needed_bytes to indicate how many follow-up
>>>>>>>>>>>>>>>>>> bytes are expected to be received after it receives a command. For
>>>>>>>>>>>>>>>>>> example, depending on the address mode, either 3-byte address or
>>>>>>>>>>>>>>>>>> 4-byte address is needed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> For fast read family commands, some dummy cycles are required after
>>>>>>>>>>>>>>>>>> sending the address bytes, and the dummy cycles need to be counted
>>>>>>>>>>>>>>>>>> in s->needed_bytes. This is where the mess began.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> As the variable name (needed_bytes) indicates, the unit is in byte.
>>>>>>>>>>>>>>>>>> It is not in bit, or cycle. However for some reason the model has
>>>>>>>>>>>>>>>>>> been using the number of dummy cycles for s->needed_bytes. The right
>>>>>>>>>>>>>>>>>> approach is to convert the number of dummy cycles to bytes based on
>>>>>>>>>>>>>>>>>> the SPI protocol, for example, 6 dummy cycles for the Fast Read Quad
>>>>>>>>>>>>>>>>>> I/O (EBh) should be converted to 3 bytes per the formula (6 * 4 / 8).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> While not being the original implementor I must assume that above solution was
>>>>>>>>>>>>>>>>> considered but not chosen by the developers due to it is inaccuracy (it
>>>>>>>>>>>>>>>>> wouldn't be possible to model exacly 6 dummy cycles, only a multiple of 8,
>>>>>>>>>>>>>>>>> meaning that if the controller is wrongly programmed to generate 7 the error
>>>>>>>>>>>>>>>>> wouldn't be caught and the controller will still be considered "correct"). Now
>>>>>>>>>>>>>>>>> that we have this detail in the implementation I'm in favor of keeping it, this
>>>>>>>>>>>>>>>>> also because the detail is already in use for catching exactly above error.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I found no clue from the commit message that my proposed solution here
>>>>>>>>>>>>>>>> was ever considered, otherwise all SPI controller models supporting
>>>>>>>>>>>>>>>> software generation should have been found out seriously broken long
>>>>>>>>>>>>>>>> time ago!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> The controllers you are referring to might lack support for commands requiring
>>>>>>>>>>>>>>> dummy clock cycles but I really hope they work with the other commands? If so I
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I am not sure why you view dummy clock cycles as something special
>>>>>>>>>>>>>> that needs some special support from the SPI controller. For the case
>>>>>>>>>>>>>> 1 controller, it's nothing special from the controller perspective,
>>>>>>>>>>>>>> just like sending out a command, or address bytes, or data. The
>>>>>>>>>>>>>> controller just shifts data bit by bit from its tx fifo and that's it.
>>>>>>>>>>>>>> In the Xilinx GQSPI controller case, the dummy cycles can either be
>>>>>>>>>>>>>> sent via a regular data (the case 1 controller) in the tx fifo, or
>>>>>>>>>>>>>> automatically generated (case 2 controller) by the hardware.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Ok, I'll try to explain my view point a little differently. For that we also
>>>>>>>>>>>>> need to keep in mind that QEMU models HW, and any binary that runs on a HW
>>>>>>>>>>>>> board supported in QEMU should ideally run on that board inside QEMU aswell
>>>>>>>>>>>>> (this can be a bare metal application equaly well as a modified u-boot/Linux
>>>>>>>>>>>>> using SPI commands with a non multiple of 8 number of dummy clock cycles).
>>>>>>>>>>>>>
>>>>>>>>>>>>> Once functionality has been introduced into QEMU it is not easy to know which
>>>>>>>>>>>>> intentional or untentional features provided by the functionality are being
>>>>>>>>>>>>> used by users. One of the (perhaps not well known) features I'm aware of that
>>>>>>>>>>>>> is in use and is provided by the accurate dummy clock cycle modeling inside
>>>>>>>>>>>>> m25p80 is the be ability to test drivers accurately regarding the dummy clock
>>>>>>>>>>>>> cycles (even when using commands with a non-multiple of 8 number of dummy clock
>>>>>>>>>>>>> cycles), but there might be others aswell. So by removing this functionality
>>>>>>>>>>>>> above use case will brake, this since those test will not be reliable.
>>>>>>>>>>>>> Furthermore, since users tend to be creative it is not possible to know if
>>>>>>>>>>>>> there are other use cases that will be affected. This means that in case [1]
>>>>>>>>>>>>> needs to be followed the safe path is to add functionality instead of removing.
>>>>>>>>>>>>> Luckily it also easier in this case, see below.
>>>>>>>>>>>>
>>>>>>>>>>>> I understand there might be users other than U-Boot/Linux that use an
>>>>>>>>>>>> odd number of dummy bits (not multiple of 8). If your concern was
>>>>>>>>>>>> about model behavior changes, sure I can update
>>>>>>>>>>>> qemu/docs/system/deprecated.rst to mention that some flashes in the
>>>>>>>>>>>> m25p80 model now implement dummy cycles as bytes.
>>>>>>>>>>>
>>>>>>>>>>> Yes, something like that. My concern is that since this functionality has been
>>>>>>>>>>> in tree for while, users have found known or unknown features that got
>>>>>>>>>>> introduced by it. By removing the functionality (and the known/uknown features)
>>>>>>>>>>> we are riscing to brake our user's use cases (currently I'm aware of one
>>>>>>>>>>> feature/use case but it is not unlikely that there are more). [1] states that
>>>>>>>>>>> "In general features are intended to be supported indefinitely once introduced
>>>>>>>>>>> into QEMU", to me that makes very much sense because the opposite would mean
>>>>>>>>>>> that we were not reliable. So in case [1] needs to be honored it looks to be
>>>>>>>>>>> safer to add functionality instead of removing (and riscing the removal of use
>>>>>>>>>>> cases/features). Luckily I still believe in this case that it will be easier to
>>>>>>>>>>> go forward (even if I also agree on what you are saying below about what I
>>>>>>>>>>> proposed).
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Even if the implementation is buggy and we need to keep the buggy
>>>>>>>>>> implementation forever? I think that's why
>>>>>>>>>> qemu/docs/system/deprecated.rst was created for deprecating such
>>>>>>>>>> feature.
>>>>>>>>>
>>>>>>>>> With the RFC I posted all commands in m25p80 are working for both the case 1
>>>>>>>>> controller (using a txfifo) and the case 2 controller (no txfifo, as GQSPI).
>>>>>>>>> Because of this, I, with all respect, will have to disagree that this is buggy.
>>>>>>>>
>>>>>>>> Well, the existing m25p80 implementation that uses dummy cycle
>>>>>>>> accuracy for those flashes prevents all SPI controllers that use tx
>>>>>>>> fifo to work with those flashes. Hence it is buggy.
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> don't think it is fair to call them 'seriously broken' (and else we should
>>>>>>>>>>>>>>> probably let the maintainers know about it). Most likely the lack of support
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I called it "seriously broken" because current implementation only
>>>>>>>>>>>>>> considered one type of SPI controllers while completely ignoring the
>>>>>>>>>>>>>> other type.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If we change view and see this from the perspective of m25p80, it models the
>>>>>>>>>>>>> commands a certain way and provides an API that the SPI controllers need to
>>>>>>>>>>>>> implement for interacting with it. It is true that there are SPI controllers
>>>>>>>>>>>>> referred to above that do not support the portion of that API that corresponds
>>>>>>>>>>>>> to commands with dummy clock cycles, but I don't think it is true that this is
>>>>>>>>>>>>> broken since there is also one SPI controller that has a working implementation
>>>>>>>>>>>>> of m25p80's full API also when transfering through a tx fifo (use case 1). But
>>>>>>>>>>>>> as mentioned above, by doing a minor extension and improvement to m25p80's API
>>>>>>>>>>>>> and allow for toggling the accuracy from dummy clock cycles to dummy bytes [1]
>>>>>>>>>>>>> will still be honored as in the same time making it possible to have full
>>>>>>>>>>>>> support for the API in the SPI controllers that currently do not (please reread
>>>>>>>>>>>>> the proposal in my previous reply that attempts to do this). I myself see this
>>>>>>>>>>>>> as win/win situation, also because no controller should need modifications.
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I am afraid your proposal does not work. Your proposed new device
>>>>>>>>>>>> property 'model_dummy_bytes' to select to convert the accurate dummy
>>>>>>>>>>>> clock cycle count to dummy bytes inside m25p80, is hard to justify as
>>>>>>>>>>>> a property to the flash itself, as the behavior is tightly coupled to
>>>>>>>>>>>> how the SPI controller works.
>>>>>>>>>>>
>>>>>>>>>>> I agree on above. I decided though that instead of posting sample code in here
>>>>>>>>>>> I'll post an RFC with hopefully an improved proposal. I'll cc you. About below,
>>>>>>>>>>> Xilinx ZynqMP GQSPI should not need any modication in a first step.
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Wait, (see below)
>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Please take a look at the Xilinx GQSPI controller, which supports both
>>>>>>>>>>>> use cases, that the dummy cycles can be transferred via tx fifo, or
>>>>>>>>>>>> generated by the controller automatically. Please read the example
>>>>>>>>>>>> given in:
>>>>>>>>>>>>
>>>>>>>>>>>>     table 24‐22, an example of Generic FIFO Contents for Quad I/O Read
>>>>>>>>>>>> Command (EBh)
>>>>>>>>>>>>
>>>>>>>>>>>> in https://www.xilinx.com/support/documentation/user_guides/ug1085-zynq-ultrascale-trm.pdf
>>>>>>>>>>>>
>>>>>>>>>>>> If you choose to set the m25p80 device property 'model_dummy_bytes' to
>>>>>>>>>>>> true when working with the Xilinx GQSPI controller, you are bound to
>>>>>>>>>>>> only allow guest software to use tx fifo to transfer the dummy cycles,
>>>>>>>>>>>> and this is wrong.
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> You missed this part. I looked at your RFC, and as I mentioned above
>>>>>>>>>> your proposal cannot support the complicated controller like Xilinx
>>>>>>>>>> GQSPI. Please read the example of table 24-22. With your RFC, you
>>>>>>>>>> mandate guest software's GQSPI driver to only use hardware dummy cycle
>>>>>>>>>> generation, which is wrong.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> First, thank you very much for looking into the RFC series, very much
>>>>>>>>> appreciated. Secondly, about above, the GQSPI model in QEMU transfers from 2
>>>>>>>>> locations in the file, in 1 location the transfer referred to above is done, in
>>>>>>>>> another location the transfer through the txfifo is done. The location where
>>>>>>>>> transfer referred to above is done will not need any modifications (and will
>>>>>>>>> thus work equally well as it does currently).
>>>>>>>>
>>>>>>>> Please explain this a little bit. How does your RFC series handle
>>>>>>>> cases as described in table 24-22, where the 6 dummy cycles are split
>>>>>>>> into 2 transfers, with one transfer using tx fifo, and the other one
>>>>>>>> using hardware dummy cycle generation?
>>>>>>>
>>>>>>> Sorry, I missunderstod. You are right, that won't work.
>>>>>>
>>>>>> +Edgar E. Iglesias
>>>>>>
>>>>>> So it looks by far the only way to implement dummy cycles correctly to
>>>>>> work with all SPI controller models is what I proposed here in this
>>>>>> patch series.
>>>>>>
>>>>>> Maintainers are quite silent, so I would like to hear your thoughts.
>>>>>>
>>>>>> @Alistair Francis @Philippe Mathieu-Daudé @Peter Maydell would you
>>>>>> please share your thoughts since you are the one who reviewed the
>>>>>> existing dummy implementation (based on commits history)
>>>>
>>>> I agree with Edgar, in that Francisco and Bin know this better than me
>>>> and that modelling things in cycles is a pain.
>>>
>>> Hi Alistair,
>>>
>>>>
>>>> As Bin points out it seems like currently we should be modelling bytes
>>>> (from the variable name) so it makes sense to keep it in bytes. I
>>>> would be in favour of this series in that case. Do we know what use
>>>> cases this will break? I know it's hard to answer but I don't think
>>>> there are too many SSI users in QEMU so it might not be too hard to
>>>> test most of the possible use cases.
>>>
>>> The use case I'm aware of is regression testing of drivers. Ex: if a
>>> driver is using 10 dummy clock cycles with the commands and a patch
>>> accidentaly changes the driver to use 11 dummy clock cycles QEMU currently
>>> finds the problem, that won't be possible with this series. It's difficult
>>> to say but it is not impossible there are other use cases also.
>>
>>
>> It was breaking the Aspeed machines :
>>
>>   https://lore.kernel.org/qemu-devel/78a12882-1303-dd6d-6619-96c5e2cbf531@kaod.org/
> 
> Yes, as I mentioned in the series the modification was based on a pure
> guess from existing QEMU codes as I could not find a datasheet of the
> Aspeed SPI controller on the internet. Do you know if this is publicly
> available?

It is not but much of the register bitfields are described in the code.
I should be able to help you in making this work.

Thanks,

C. 


>> QEMU 6.1 should have acceptance tests that will help in detecting
>> regressions in this area.
>>
> 
> Regards,
> Bin
> 



^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2021-04-28 14:03 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-01-14 15:08 [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Bin Meng
2021-01-14 15:08 ` [PATCH 1/9] hw/block: m25p80: Fix the number of dummy bytes needed for Windbond flashes Bin Meng
2021-01-14 15:08 ` [PATCH 2/9] hw/block: m25p80: Fix the number of dummy bytes needed for Numonyx/Micron flashes Bin Meng
2021-01-14 15:08 ` [PATCH 3/9] hw/block: m25p80: Fix the number of dummy bytes needed for Macronix flashes Bin Meng
2021-01-14 15:08 ` [PATCH 4/9] hw/block: m25p80: Fix the number of dummy bytes needed for Spansion flashes Bin Meng
2021-01-14 15:08 ` [PATCH 5/9] hw/block: m25p80: Support fast read for SST flashes Bin Meng
2021-01-14 15:08 ` [PATCH 6/9] hw/ssi: xilinx_spips: Fix generic fifo dummy cycle handling Bin Meng
2021-01-14 15:09 ` [PATCH 7/9] Revert "aspeed/smc: Fix number of dummy cycles for FAST_READ_4 command" Bin Meng
2021-01-14 15:09 ` [PATCH 8/9] Revert "aspeed/smc: snoop SPI transfers to fake dummy cycles" Bin Meng
2021-01-14 15:09 ` [PATCH 9/9] hw/ssi: npcm7xx_fiu: Correct the dummy cycle emulation logic Bin Meng
2021-01-14 17:12   ` Havard Skinnemoen via
2021-01-14 15:59 ` [PATCH 0/9] hw/block: m25p80: Fix the mess of dummy bytes needed for fast read commands Cédric Le Goater
2021-01-14 16:12 ` no-reply
2021-01-14 18:13 ` Francisco Iglesias
2021-01-15  2:07   ` Bin Meng
2021-01-15  3:29     ` Havard Skinnemoen via
2021-01-15 13:54       ` Bin Meng
2021-01-15 12:26     ` Francisco Iglesias
2021-01-15 14:38       ` Bin Meng
2021-01-18 10:05         ` Francisco Iglesias
2021-01-18 12:32           ` Bin Meng
2021-01-19 13:01             ` Francisco Iglesias
2021-01-20 14:20               ` Bin Meng
2021-01-21  8:50                 ` Francisco Iglesias
2021-01-21  8:59                   ` Bin Meng
2021-01-21 10:01                     ` Francisco Iglesias
2021-01-21 14:18                     ` Francisco Iglesias
2021-02-08 14:41                       ` Bin Meng
2021-02-08 15:30                         ` Edgar E. Iglesias
2021-02-09  9:35                           ` Francisco Iglesias
2021-04-23  6:45                         ` Bin Meng
2021-04-27  5:56                           ` Alistair Francis
2021-04-27  8:54                             ` Francisco Iglesias
2021-04-27 14:32                               ` Cédric Le Goater
2021-04-28 13:12                                 ` Bin Meng
2021-04-28 13:54                                   ` Cédric Le Goater

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).