All of lore.kernel.org
 help / color / mirror / Atom feed
* [U-Boot] [PATCH 0/9] CACHE: Finishing touches
@ 2012-06-25  0:17 Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 1/9] COMMON: Add __stringify() function Marek Vasut
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

Add a few finishing touches for the cache support. Then ultimately
enable cache support on m28.

BIG FAT NOTE: This is for -next. This may break something.
              Compile testing IS STILL IN PROGRESS!

Marek Vasut (9):
  COMMON: Add __stringify() function
  CACHE: Add cache_aligned() macro
  CACHE: ext2load: Test if start address is aligned
  CACHE: fatload: Test if start address is aligned
  CACHE: mmc read/write: Test if start address is aligned
  CACHE: nand read/write: Test if start address is aligned
  CACHE: net: Test if start address is aligned
  CACHE: net: asix: Fix asix driver to work with data cache on
  M28EVK: Enable instruction and data cache

 common/cmd_ext2.c        |    3 +++
 common/cmd_fat.c         |    4 ++++
 common/cmd_mmc.c         |    2 ++
 common/cmd_nand.c        |    9 +++++++++
 common/cmd_net.c         |    4 ++++
 drivers/usb/eth/asix.c   |   29 ++++++++++++++++-------------
 include/common.h         |   25 +++++++++++++++++++++++++
 include/configs/m28evk.h |    2 --
 8 files changed, 63 insertions(+), 15 deletions(-)

Cc: Wolfgang Denk <wd@denx.de>

-- 
1.7.10

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 1/9] COMMON: Add __stringify() function
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro Marek Vasut
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

This function converts static number to string in preprocessor.
This is useful as it allows higher usage of puts() in favour of printf()

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Wolfgang Denk <wd@denx.de>
---
 include/common.h |    7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/include/common.h b/include/common.h
index 7cf15c5..322569e 100644
--- a/include/common.h
+++ b/include/common.h
@@ -268,6 +268,13 @@ typedef void (interrupt_handler_t)(void *);
 	const typeof( ((type *)0)->member ) *__mptr = (ptr);	\
 	(type *)( (char *)__mptr - offsetof(type,member) );})
 
+/**
+ * __stringify - preprocessor magic to return string from number
+ * @x:           constant number
+ */
+#define __stringify_1(x...)     #x
+#define __stringify(x...)       __stringify_1(x)
+
 /*
  * Function Prototypes
  */
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 1/9] COMMON: Add __stringify() function Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25 21:12   ` Scott Wood
  2012-06-25  0:17 ` [U-Boot] [PATCH 3/9] CACHE: ext2load: Test if start address is aligned Marek Vasut
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

This macro returns 1 if the argument (address) is aligned, returns
zero otherwise. This will be used to test user-supplied address to
various commands to prevent user from loading data to/from unaligned
address when using caches.

This is made as a macro, because macros are expanded where they are
used. Therefore it can be easily instrumented to report position of
the fault.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Wolfgang Denk <wd@denx.de>
---
 include/common.h |   18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/include/common.h b/include/common.h
index 322569e..17c64b0 100644
--- a/include/common.h
+++ b/include/common.h
@@ -730,6 +730,24 @@ void	invalidate_dcache_range(unsigned long start, unsigned long stop);
 void	invalidate_dcache_all(void);
 void	invalidate_icache_all(void);
 
+/* Test if address is cache-aligned. Returns 0 if it is, 1 otherwise. */
+#define cacheline_aligned(addr)						\
+	({								\
+	int __ret;							\
+	if (!dcache_status()) {						\
+		__ret = 1;						\
+	} else if ((addr) & (ARCH_DMA_MINALIGN - 1)) {			\
+		puts("Align load address to "				\
+			__stringify(ARCH_DMA_MINALIGN)			\
+			" bytes when using caches!\n");			\
+		__ret = 0;						\
+	} else {							\
+		__ret = 1;						\
+	}								\
+	__ret;								\
+	})
+
+
 /* arch/$(ARCH)/lib/ticks.S */
 unsigned long long get_ticks(void);
 void	wait_ticks    (unsigned long);
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 3/9] CACHE: ext2load: Test if start address is aligned
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 1/9] COMMON: Add __stringify() function Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 4/9] CACHE: fatload: " Marek Vasut
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

This prevents the scenario where data cache is on and the
device uses DMA to deploy data. In that case, it might not
be possible to flush/invalidate data to RAM properly. The
other option is to use bounce buffer, but that involves a
lot of copying and therefore degrades performance rapidly.
Therefore disallow this possibility of unaligned load address
altogether if data cache is on.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Wolfgang Denk <wd@denx.de>
---
 common/cmd_ext2.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/common/cmd_ext2.c b/common/cmd_ext2.c
index 79b1e2f..45b5a0c 100644
--- a/common/cmd_ext2.c
+++ b/common/cmd_ext2.c
@@ -166,6 +166,9 @@ int do_ext2load (cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
 		return CMD_RET_USAGE;
 	}
 
+	if (!cacheline_aligned(addr))
+		return 1;
+
 	if (!filename) {
 		puts ("** No boot file defined **\n");
 		return 1;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 4/9] CACHE: fatload: Test if start address is aligned
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
                   ` (2 preceding siblings ...)
  2012-06-25  0:17 ` [U-Boot] [PATCH 3/9] CACHE: ext2load: Test if start address is aligned Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 5/9] CACHE: mmc read/write: " Marek Vasut
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

This prevents the scenario where data cache is on and the
device uses DMA to deploy data. In that case, it might not
be possible to flush/invalidate data to RAM properly. The
other option is to use bounce buffer, but that involves a
lot of copying and therefore degrades performance rapidly.
Therefore disallow this possibility of unaligned load address
altogether if data cache is on.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Wolfgang Denk <wd@denx.de>
---
 common/cmd_fat.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/common/cmd_fat.c b/common/cmd_fat.c
index 559a16d..951e712 100644
--- a/common/cmd_fat.c
+++ b/common/cmd_fat.c
@@ -68,7 +68,11 @@ int do_fat_fsload (cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
 			argv[1], dev, part);
 		return 1;
 	}
+
 	offset = simple_strtoul(argv[3], NULL, 16);
+	if (!cacheline_aligned(offset))
+		return 1;
+
 	if (argc == 6)
 		count = simple_strtoul(argv[5], NULL, 16);
 	else
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 5/9] CACHE: mmc read/write: Test if start address is aligned
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
                   ` (3 preceding siblings ...)
  2012-06-25  0:17 ` [U-Boot] [PATCH 4/9] CACHE: fatload: " Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 6/9] CACHE: nand " Marek Vasut
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

This prevents the scenario where data cache is on and the
device uses DMA to deploy data. In that case, it might not
be possible to flush/invalidate data to RAM properly. The
other option is to use bounce buffer, but that involves a
lot of copying and therefore degrades performance rapidly.
Therefore disallow this possibility of unaligned load address
altogether if data cache is on.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Andy Fleming <afleming@freescale.com>
---
 common/cmd_mmc.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/common/cmd_mmc.c b/common/cmd_mmc.c
index 750509d..a47baaa 100644
--- a/common/cmd_mmc.c
+++ b/common/cmd_mmc.c
@@ -268,6 +268,8 @@ int do_mmcops(cmd_tbl_t *cmdtp, int flag, int argc, char * const argv[])
 
 		if (state != MMC_ERASE) {
 			addr = (void *)simple_strtoul(argv[idx], NULL, 16);
+			if (!cacheline_aligned((ulong)addr))
+				return 1;
 			++idx;
 		} else
 			addr = 0;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
                   ` (4 preceding siblings ...)
  2012-06-25  0:17 ` [U-Boot] [PATCH 5/9] CACHE: mmc read/write: " Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25 16:58   ` Scott Wood
  2012-07-07  3:05   ` Aneesh V
  2012-06-25  0:17 ` [U-Boot] [PATCH 7/9] CACHE: net: " Marek Vasut
                   ` (2 subsequent siblings)
  8 siblings, 2 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

This prevents the scenario where data cache is on and the
device uses DMA to deploy data. In that case, it might not
be possible to flush/invalidate data to RAM properly. The
other option is to use bounce buffer, but that involves a
lot of copying and therefore degrades performance rapidly.
Therefore disallow this possibility of unaligned load address
altogether if data cache is on.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Scott Wood <scottwood@freescale.com>
---
 common/cmd_nand.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/common/cmd_nand.c b/common/cmd_nand.c
index a91ccf4..122a91c 100644
--- a/common/cmd_nand.c
+++ b/common/cmd_nand.c
@@ -609,6 +609,8 @@ int do_nand(cmd_tbl_t * cmdtp, int flag, int argc, char * const argv[])
 			goto usage;
 
 		addr = (ulong)simple_strtoul(argv[2], NULL, 16);
+		if (!cacheline_aligned(addr))
+			return 1;
 
 		read = strncmp(cmd, "read", 4) == 0; /* 1 = read, 0 = write */
 		printf("\nNAND %s: ", read ? "read" : "write");
@@ -924,6 +926,9 @@ int do_nandboot(cmd_tbl_t * cmdtp, int flag, int argc, char * const argv[])
 				addr = simple_strtoul(argv[1], NULL, 16);
 			else
 				addr = CONFIG_SYS_LOAD_ADDR;
+			if (!cacheline_aligned(addr))
+				return 1;
+
 			return nand_load_image(cmdtp, &nand_info[dev->id->num],
 					       part->offset, addr, argv[0]);
 		}
@@ -956,6 +961,10 @@ usage:
 		bootstage_error(BOOTSTAGE_ID_NAND_SUFFIX);
 		return CMD_RET_USAGE;
 	}
+
+	if (!cacheline_aligned(addr))
+		return 1;
+
 	bootstage_mark(BOOTSTAGE_ID_NAND_SUFFIX);
 
 	if (!boot_device) {
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 7/9] CACHE: net: Test if start address is aligned
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
                   ` (5 preceding siblings ...)
  2012-06-25  0:17 ` [U-Boot] [PATCH 6/9] CACHE: nand " Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25 18:05   ` Joe Hershberger
  2012-06-25  0:17 ` [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on Marek Vasut
  2012-06-25  0:17 ` [U-Boot] [PATCH 9/9] M28EVK: Enable instruction and data cache Marek Vasut
  8 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

This prevents the scenario where data cache is on and the
device uses DMA to deploy data. In that case, it might not
be possible to flush/invalidate data to RAM properly. The
other option is to use bounce buffer, but that involves a
lot of copying and therefore degrades performance rapidly.
Therefore disallow this possibility of unaligned load address
altogether if data cache is on.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Joe Hershberger <joe.hershberger@gmail.com>
---
 common/cmd_net.c |    4 ++++
 1 file changed, 4 insertions(+)

diff --git a/common/cmd_net.c b/common/cmd_net.c
index a9ade8b..917061b 100644
--- a/common/cmd_net.c
+++ b/common/cmd_net.c
@@ -242,6 +242,10 @@ static int netboot_common(enum proto_t proto, cmd_tbl_t *cmdtp, int argc,
 		bootstage_error(BOOTSTAGE_ID_NET_START);
 		return CMD_RET_USAGE;
 	}
+
+	if (!cacheline_aligned(load_addr))
+		return 1;
+
 	bootstage_mark(BOOTSTAGE_ID_NET_START);
 
 	if ((size = NetLoop(proto)) < 0) {
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
                   ` (6 preceding siblings ...)
  2012-06-25  0:17 ` [U-Boot] [PATCH 7/9] CACHE: net: " Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  2012-06-25 18:07   ` Joe Hershberger
  2012-06-25  0:17 ` [U-Boot] [PATCH 9/9] M28EVK: Enable instruction and data cache Marek Vasut
  8 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

The asix driver did not align buffers, therefore it didn't work
with data cache enabled. Fix this.

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Joe Hershberger <joe.hershberger@gmail.com>
---
 drivers/usb/eth/asix.c |   29 ++++++++++++++++-------------
 1 file changed, 16 insertions(+), 13 deletions(-)

diff --git a/drivers/usb/eth/asix.c b/drivers/usb/eth/asix.c
index a3bf51a..c27ed51 100644
--- a/drivers/usb/eth/asix.c
+++ b/drivers/usb/eth/asix.c
@@ -168,27 +168,28 @@ static inline int asix_set_hw_mii(struct ueth_data *dev)
 
 static int asix_mdio_read(struct ueth_data *dev, int phy_id, int loc)
 {
-	__le16 res;
+	ALLOC_CACHE_ALIGN_BUFFER(__le16, res, 1);
 
 	asix_set_sw_mii(dev);
-	asix_read_cmd(dev, AX_CMD_READ_MII_REG, phy_id, (__u16)loc, 2, &res);
+	asix_read_cmd(dev, AX_CMD_READ_MII_REG, phy_id, (__u16)loc, 2, res);
 	asix_set_hw_mii(dev);
 
 	debug("asix_mdio_read() phy_id=0x%02x, loc=0x%02x, returns=0x%04x\n",
-			phy_id, loc, le16_to_cpu(res));
+			phy_id, loc, le16_to_cpu(*res));
 
-	return le16_to_cpu(res);
+	return le16_to_cpu(*res);
 }
 
 static void
 asix_mdio_write(struct ueth_data *dev, int phy_id, int loc, int val)
 {
-	__le16 res = cpu_to_le16(val);
+	ALLOC_CACHE_ALIGN_BUFFER(__le16, res, 1);
+	*res = cpu_to_le16(val);
 
 	debug("asix_mdio_write() phy_id=0x%02x, loc=0x%02x, val=0x%04x\n",
 			phy_id, loc, val);
 	asix_set_sw_mii(dev);
-	asix_write_cmd(dev, AX_CMD_WRITE_MII_REG, phy_id, (__u16)loc, 2, &res);
+	asix_write_cmd(dev, AX_CMD_WRITE_MII_REG, phy_id, (__u16)loc, 2, res);
 	asix_set_hw_mii(dev);
 }
 
@@ -210,7 +211,8 @@ static int asix_sw_reset(struct ueth_data *dev, u8 flags)
 
 static inline int asix_get_phy_addr(struct ueth_data *dev)
 {
-	u8 buf[2];
+	ALLOC_CACHE_ALIGN_BUFFER(u8, buf, 2);
+
 	int ret = asix_read_cmd(dev, AX_CMD_READ_PHY_ID, 0, 0, 2, buf);
 
 	debug("asix_get_phy_addr()\n");
@@ -242,13 +244,14 @@ static int asix_write_medium_mode(struct ueth_data *dev, u16 mode)
 
 static u16 asix_read_rx_ctl(struct ueth_data *dev)
 {
-	__le16 v;
-	int ret = asix_read_cmd(dev, AX_CMD_READ_RX_CTL, 0, 0, 2, &v);
+	ALLOC_CACHE_ALIGN_BUFFER(__le16, v, 1);
+
+	int ret = asix_read_cmd(dev, AX_CMD_READ_RX_CTL, 0, 0, 2, v);
 
 	if (ret < 0)
 		debug("Error reading RX_CTL register: %02x\n", ret);
 	else
-		ret = le16_to_cpu(v);
+		ret = le16_to_cpu(*v);
 	return ret;
 }
 
@@ -313,7 +316,7 @@ static int mii_nway_restart(struct ueth_data *dev)
 static int asix_init(struct eth_device *eth, bd_t *bd)
 {
 	int embd_phy;
-	unsigned char buf[ETH_ALEN];
+	ALLOC_CACHE_ALIGN_BUFFER(unsigned char, buf, ETH_ALEN);
 	u16 rx_ctl;
 	struct ueth_data	*dev = (struct ueth_data *)eth->priv;
 	int timeout = 0;
@@ -425,7 +428,7 @@ static int asix_send(struct eth_device *eth, void *packet, int length)
 	int err;
 	u32 packet_len;
 	int actual_len;
-	unsigned char msg[PKTSIZE + sizeof(packet_len)];
+	ALLOC_CACHE_ALIGN_BUFFER(unsigned char, msg, PKTSIZE + sizeof(packet_len));
 
 	debug("** %s(), len %d\n", __func__, length);
 
@@ -452,7 +455,7 @@ static int asix_send(struct eth_device *eth, void *packet, int length)
 static int asix_recv(struct eth_device *eth)
 {
 	struct ueth_data *dev = (struct ueth_data *)eth->priv;
-	static unsigned char  recv_buf[AX_RX_URB_SIZE];
+	ALLOC_CACHE_ALIGN_BUFFER(unsigned char, recv_buf, AX_RX_URB_SIZE);
 	unsigned char *buf_ptr;
 	int err;
 	int actual_len;
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 9/9] M28EVK: Enable instruction and data cache
  2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
                   ` (7 preceding siblings ...)
  2012-06-25  0:17 ` [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on Marek Vasut
@ 2012-06-25  0:17 ` Marek Vasut
  8 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25  0:17 UTC (permalink / raw)
  To: u-boot

Signed-off-by: Marek Vasut <marex@denx.de>
Cc: Wolfgang Denk <wd@denx.de>
Cc: Stefano Babic <sbabic@denx.de>
Cc: Fabio Estevam <festevam@freescale.com>
---
 include/configs/m28evk.h |    2 --
 1 file changed, 2 deletions(-)

diff --git a/include/configs/m28evk.h b/include/configs/m28evk.h
index b730b2a..347a3ea 100644
--- a/include/configs/m28evk.h
+++ b/include/configs/m28evk.h
@@ -37,8 +37,6 @@
 #define	CONFIG_MACH_TYPE	MACH_TYPE_M28EVK
 
 #define	CONFIG_SYS_NO_FLASH
-#define	CONFIG_SYS_ICACHE_OFF
-#define	CONFIG_SYS_DCACHE_OFF
 #define	CONFIG_BOARD_EARLY_INIT_F
 #define	CONFIG_ARCH_CPU_INIT
 #define	CONFIG_ARCH_MISC_INIT
-- 
1.7.10

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25  0:17 ` [U-Boot] [PATCH 6/9] CACHE: nand " Marek Vasut
@ 2012-06-25 16:58   ` Scott Wood
  2012-06-25 18:43     ` Tom Rini
  2012-06-25 23:37     ` Marek Vasut
  2012-07-07  3:05   ` Aneesh V
  1 sibling, 2 replies; 36+ messages in thread
From: Scott Wood @ 2012-06-25 16:58 UTC (permalink / raw)
  To: u-boot

On 06/24/2012 07:17 PM, Marek Vasut wrote:
> This prevents the scenario where data cache is on and the
> device uses DMA to deploy data. In that case, it might not
> be possible to flush/invalidate data to RAM properly. The
> other option is to use bounce buffer,

Or get cache coherent hardware. :-)

> but that involves a lot of copying and therefore degrades performance
> rapidly. Therefore disallow this possibility of unaligned load
> address altogether if data cache is on.

How about use the bounce buffer only if the address is misaligned?  The
corrective action a user has to take is the same as with this patch,
except for an additional option of living with the slight performance
penalty.  How often does this actually happen?  How much does it
actually slow things down compared to the speed of the NAND chip?

I'm hesitant to break something -- even if it's odd (literally in this
case) -- that currently works on most hardware, just because one or two
drivers can't handle it.  It feels kind of like changing the read() and
write() system calls to require cacheline alignment. :-P

> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Scott Wood <scottwood@freescale.com>
> ---
>  common/cmd_nand.c |    9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/common/cmd_nand.c b/common/cmd_nand.c
> index a91ccf4..122a91c 100644
> --- a/common/cmd_nand.c
> +++ b/common/cmd_nand.c
> @@ -609,6 +609,8 @@ int do_nand(cmd_tbl_t * cmdtp, int flag, int argc, char * const argv[])
>  			goto usage;
>  
>  		addr = (ulong)simple_strtoul(argv[2], NULL, 16);
> +		if (!cacheline_aligned(addr))
> +			return 1;

There's no way you can just return like this without printing an error
that lets the user know that the operation wasn't performed, and why.

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 7/9] CACHE: net: Test if start address is aligned
  2012-06-25  0:17 ` [U-Boot] [PATCH 7/9] CACHE: net: " Marek Vasut
@ 2012-06-25 18:05   ` Joe Hershberger
  2012-06-25 23:16     ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Joe Hershberger @ 2012-06-25 18:05 UTC (permalink / raw)
  To: u-boot

Hi Marek,

On Sun, Jun 24, 2012 at 7:17 PM, Marek Vasut <marex@denx.de> wrote:
> This prevents the scenario where data cache is on and the
> device uses DMA to deploy data. In that case, it might not
> be possible to flush/invalidate data to RAM properly. The
> other option is to use bounce buffer, but that involves a
> lot of copying and therefore degrades performance rapidly.
> Therefore disallow this possibility of unaligned load address
> altogether if data cache is on.
>
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Joe Hershberger <joe.hershberger@gmail.com>
> ---
Acked-by: Joe Hershberger <joe.hershberger@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on
  2012-06-25  0:17 ` [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on Marek Vasut
@ 2012-06-25 18:07   ` Joe Hershberger
  2012-06-25 23:16     ` Marek Vasut
  2012-07-06 23:09     ` Marek Vasut
  0 siblings, 2 replies; 36+ messages in thread
From: Joe Hershberger @ 2012-06-25 18:07 UTC (permalink / raw)
  To: u-boot

Hi Marex,

On Sun, Jun 24, 2012 at 7:17 PM, Marek Vasut <marex@denx.de> wrote:
> The asix driver did not align buffers, therefore it didn't work
> with data cache enabled. Fix this.
>
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Joe Hershberger <joe.hershberger@gmail.com>
> ---
Acked-by: Joe Hershberger <joe.hershberger@gmail.com>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 16:58   ` Scott Wood
@ 2012-06-25 18:43     ` Tom Rini
  2012-06-25 20:08       ` Scott Wood
  2012-06-25 23:38       ` Marek Vasut
  2012-06-25 23:37     ` Marek Vasut
  1 sibling, 2 replies; 36+ messages in thread
From: Tom Rini @ 2012-06-25 18:43 UTC (permalink / raw)
  To: u-boot

On Mon, Jun 25, 2012 at 11:58:10AM -0500, Scott Wood wrote:
> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> > This prevents the scenario where data cache is on and the
> > device uses DMA to deploy data. In that case, it might not
> > be possible to flush/invalidate data to RAM properly. The
> > other option is to use bounce buffer,
> 
> Or get cache coherent hardware. :-)
> 
> > but that involves a lot of copying and therefore degrades performance
> > rapidly. Therefore disallow this possibility of unaligned load
> > address altogether if data cache is on.
> 
> How about use the bounce buffer only if the address is misaligned?  The
> corrective action a user has to take is the same as with this patch,
> except for an additional option of living with the slight performance
> penalty.  How often does this actually happen?  How much does it
> actually slow things down compared to the speed of the NAND chip?

We would need to architect things such that any 'load' command would be
routed through this logic.  A solvable problem for sure, but a little
bit larger than just for NAND :)

> I'm hesitant to break something -- even if it's odd (literally in this
> case) -- that currently works on most hardware, just because one or two
> drivers can't handle it.  It feels kind of like changing the read() and
> write() system calls to require cacheline alignment. :-P

I don't want to get into an ARM vs PowerPC argument.  I think the best
answer is that I'm not sure having things unaligned works totally right
today as I did a silly test of loading a uImage to 0x82000001 and bootm
hung inside of U-Boot not long ago.  Can you try that on some cache
coherent hardware and see if that works?

[snip]
> > @@ -609,6 +609,8 @@ int do_nand(cmd_tbl_t * cmdtp, int flag, int argc, char * const argv[])
> >  			goto usage;
> >  
> >  		addr = (ulong)simple_strtoul(argv[2], NULL, 16);
> > +		if (!cacheline_aligned(addr))
> > +			return 1;
> 
> There's no way you can just return like this without printing an error
> that lets the user know that the operation wasn't performed, and why.

Note that cacheline_aligned does the error print (patch 2 of 9).

-- 
Tom
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <http://lists.denx.de/pipermail/u-boot/attachments/20120625/d9f9ea2b/attachment.pgp>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 18:43     ` Tom Rini
@ 2012-06-25 20:08       ` Scott Wood
  2012-06-25 20:48         ` Tom Rini
  2012-06-25 23:38       ` Marek Vasut
  1 sibling, 1 reply; 36+ messages in thread
From: Scott Wood @ 2012-06-25 20:08 UTC (permalink / raw)
  To: u-boot

On 06/25/2012 01:43 PM, Tom Rini wrote:
> On Mon, Jun 25, 2012 at 11:58:10AM -0500, Scott Wood wrote:
>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
>>> This prevents the scenario where data cache is on and the
>>> device uses DMA to deploy data. In that case, it might not
>>> be possible to flush/invalidate data to RAM properly. The
>>> other option is to use bounce buffer,
>>
>> Or get cache coherent hardware. :-)
>>
>>> but that involves a lot of copying and therefore degrades performance
>>> rapidly. Therefore disallow this possibility of unaligned load
>>> address altogether if data cache is on.
>>
>> How about use the bounce buffer only if the address is misaligned?  The
>> corrective action a user has to take is the same as with this patch,
>> except for an additional option of living with the slight performance
>> penalty.  How often does this actually happen?  How much does it
>> actually slow things down compared to the speed of the NAND chip?
> 
> We would need to architect things such that any 'load' command would be
> routed through this logic.

It's something the driver backend should handle (possibly via a common
helper library).  The fact that you can't do a DMA transfer to an
unaligned buffer is a hardware-specific detail, just as is the fact that
you're setting up a DMA buffer in the first place.

>> I'm hesitant to break something -- even if it's odd (literally in this
>> case) -- that currently works on most hardware, just because one or two
>> drivers can't handle it.  It feels kind of like changing the read() and
>> write() system calls to require cacheline alignment. :-P
> 
> I don't want to get into an ARM vs PowerPC argument.  I think the best
> answer is that I'm not sure having things unaligned works totally right
> today as I did a silly test of loading a uImage to 0x82000001 and bootm
> hung inside of U-Boot not long ago.  Can you try that on some cache
> coherent hardware and see if that works?

I'm not sure what bootm has to do with nand (and the fact that some ppc
is cache coherent actually doesn't matter, since we don't do DMA for
NAND), but I was able to bootm from an odd RAM address, and "nand read"
to an odd RAM address, on p5020ds.

> [snip]
>>> @@ -609,6 +609,8 @@ int do_nand(cmd_tbl_t * cmdtp, int flag, int argc, char * const argv[])
>>>  			goto usage;
>>>  
>>>  		addr = (ulong)simple_strtoul(argv[2], NULL, 16);
>>> +		if (!cacheline_aligned(addr))
>>> +			return 1;
>>
>> There's no way you can just return like this without printing an error
>> that lets the user know that the operation wasn't performed, and why.
> 
> Note that cacheline_aligned does the error print (patch 2 of 9).

Ah.  I saw this patch first since it was CCed to me. :-)

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 20:08       ` Scott Wood
@ 2012-06-25 20:48         ` Tom Rini
  2012-06-25 21:17           ` Scott Wood
  0 siblings, 1 reply; 36+ messages in thread
From: Tom Rini @ 2012-06-25 20:48 UTC (permalink / raw)
  To: u-boot

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/25/2012 01:08 PM, Scott Wood wrote:
> On 06/25/2012 01:43 PM, Tom Rini wrote:
>> On Mon, Jun 25, 2012 at 11:58:10AM -0500, Scott Wood wrote:
>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
>>>> This prevents the scenario where data cache is on and the 
>>>> device uses DMA to deploy data. In that case, it might not be
>>>> possible to flush/invalidate data to RAM properly. The other
>>>> option is to use bounce buffer,
>>> 
>>> Or get cache coherent hardware. :-)
>>> 
>>>> but that involves a lot of copying and therefore degrades
>>>> performance rapidly. Therefore disallow this possibility of
>>>> unaligned load address altogether if data cache is on.
>>> 
>>> How about use the bounce buffer only if the address is
>>> misaligned?  The corrective action a user has to take is the
>>> same as with this patch, except for an additional option of
>>> living with the slight performance penalty.  How often does
>>> this actually happen?  How much does it actually slow things
>>> down compared to the speed of the NAND chip?
>> 
>> We would need to architect things such that any 'load' command
>> would be routed through this logic.
> 
> It's something the driver backend should handle (possibly via a
> common helper library).  The fact that you can't do a DMA transfer
> to an unaligned buffer is a hardware-specific detail, just as is
> the fact that you're setting up a DMA buffer in the first place.

Right.  What I'm trying to say is it's not a NAND problem it's an
unaligned addresses problem so the solution needs to be easily used
everywhere.

>>> I'm hesitant to break something -- even if it's odd (literally
>>> in this case) -- that currently works on most hardware, just
>>> because one or two drivers can't handle it.  It feels kind of
>>> like changing the read() and write() system calls to require
>>> cacheline alignment. :-P
>> 
>> I don't want to get into an ARM vs PowerPC argument.  I think the
>> best answer is that I'm not sure having things unaligned works
>> totally right today as I did a silly test of loading a uImage to
>> 0x82000001 and bootm hung inside of U-Boot not long ago.  Can you
>> try that on some cache coherent hardware and see if that works?
> 
> I'm not sure what bootm has to do with nand (and the fact that some
> ppc is cache coherent actually doesn't matter, since we don't do
> DMA for NAND), but I was able to bootm from an odd RAM address, and
> "nand read" to an odd RAM address, on p5020ds.

On ARM-land we have a lot of problems with unaligned addresses, even
with cache off.  I went to reproduce the original bootm problem and
ran into fatload hanging.  tftp didn't fail but bootm hangs.

- -- 
Tom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJP6M6BAAoJENk4IS6UOR1WQXkP/1L0lbnKcpJ5jY9oCyIA29zE
r2eNVmsetudSTVC1rbQsmr5qd15xmzdqIi4yGVxOQIj6rj5eAhwRt8W6vRplNlSy
zRSFcZTh8YGdmNyLeiSiXFj/sxppIeSwPJqm1b17U7O43W8G4OVh73Ox1s7/YVB+
ag49VFokK0RBjgY8vux3YeJtpbDVTS0ZAmOXkoxLpvQgctttpnyDjoz9nbhfLA9d
BjmpVHKKkwKChhIuNyexMXregkZzfVAw9hJ9N3SCv0/x4VylRQBq3QNroHQJdGGq
0H0FRjAz9iLFu/EsHjmvzPgpCWeWgFLeGh1YLWORSZD3M8YlQvj5j6mSUsZjr8Ab
DJOyvZRsYgOK+h/+CIPdbGufY0yQfBPJPvzYkhxPAvs8zKkSwVYRm7xqPbSQNxYt
uBJh7Xw3M+rGVmqGmd6bERgeFOiL7BV3Jtfd3YaDQmJqiPS4r2nqwAjiDNaVUYlu
9TaxkbyTQl5PyCA+hkiwO4ebnSRgaGLrgnNZfkIuZD0qcpJ3A/Zldos2YXherwPc
/azJiitQvRrxxRU8RHEg1vkTOnX4urQi2cfPap+QiTb5KT+Gxrg6UPwBWvsOmDmY
MqIsHwgdpZflOOOX9Vco8ViSE44ePYaQDRLNhHbrdCAYEIEOxRIReAHs9YVVO2my
6OzYbd4sBdJDe/wl8s8X
=7znQ
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro
  2012-06-25  0:17 ` [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro Marek Vasut
@ 2012-06-25 21:12   ` Scott Wood
  2012-06-25 23:30     ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Scott Wood @ 2012-06-25 21:12 UTC (permalink / raw)
  To: u-boot

On 06/24/2012 07:17 PM, Marek Vasut wrote:
> This macro returns 1 if the argument (address) is aligned, returns
> zero otherwise. This will be used to test user-supplied address to
> various commands to prevent user from loading data to/from unaligned
> address when using caches.
> 
> This is made as a macro, because macros are expanded where they are
> used. Therefore it can be easily instrumented to report position of
> the fault.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Wolfgang Denk <wd@denx.de>
> ---
>  include/common.h |   18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/include/common.h b/include/common.h
> index 322569e..17c64b0 100644
> --- a/include/common.h
> +++ b/include/common.h
> @@ -730,6 +730,24 @@ void	invalidate_dcache_range(unsigned long start, unsigned long stop);
>  void	invalidate_dcache_all(void);
>  void	invalidate_icache_all(void);
>  
> +/* Test if address is cache-aligned. Returns 0 if it is, 1 otherwise. */
> +#define cacheline_aligned(addr)						\
> +	({								\
> +	int __ret;							\
> +	if (!dcache_status()) {						\
> +		__ret = 1;						\
> +	} else if ((addr) & (ARCH_DMA_MINALIGN - 1)) {			\
> +		puts("Align load address to "				\
> +			__stringify(ARCH_DMA_MINALIGN)			\
> +			" bytes when using caches!\n");			\
> +		__ret = 0;						\
> +	} else {							\
> +		__ret = 1;						\
> +	}								\
> +	__ret;								\
> +	})

What if it's a store rather than a load?  If this is only supposed to be
used for loads (because on a store you can flush the cache rather than
invalidate), it's not labelled that way, the changelog says "to/from
unaligned address", and the caller might be common code that doesn't
know which direction the transfer is in.  Besides, it would be awkward
user interface to allow an address to be used in one direction but not
the other.

What if the caller wants to try a different strategy if this returns 0,
rather than print an error?

Why should the success of a command depend on whether caches are
enabled?  If we're going to forbid unaligned addresses in certain
contexts, shouldn't it always be forbidden to ensure consistent user
experience?  Or if we're going to be picky about when we reject it, why
don't we care whether the device in question does DMA, and whether that
DMA is coherent?

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 20:48         ` Tom Rini
@ 2012-06-25 21:17           ` Scott Wood
  2012-06-25 21:22             ` Tom Rini
  2012-06-25 23:42             ` Marek Vasut
  0 siblings, 2 replies; 36+ messages in thread
From: Scott Wood @ 2012-06-25 21:17 UTC (permalink / raw)
  To: u-boot

On 06/25/2012 03:48 PM, Tom Rini wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 06/25/2012 01:08 PM, Scott Wood wrote:
>> On 06/25/2012 01:43 PM, Tom Rini wrote:
>>> On Mon, Jun 25, 2012 at 11:58:10AM -0500, Scott Wood wrote:
>>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
>>>>> This prevents the scenario where data cache is on and the 
>>>>> device uses DMA to deploy data. In that case, it might not be
>>>>> possible to flush/invalidate data to RAM properly. The other
>>>>> option is to use bounce buffer,
>>>>
>>>> Or get cache coherent hardware. :-)
>>>>
>>>>> but that involves a lot of copying and therefore degrades
>>>>> performance rapidly. Therefore disallow this possibility of
>>>>> unaligned load address altogether if data cache is on.
>>>>
>>>> How about use the bounce buffer only if the address is
>>>> misaligned?  The corrective action a user has to take is the
>>>> same as with this patch, except for an additional option of
>>>> living with the slight performance penalty.  How often does
>>>> this actually happen?  How much does it actually slow things
>>>> down compared to the speed of the NAND chip?
>>>
>>> We would need to architect things such that any 'load' command
>>> would be routed through this logic.
>>
>> It's something the driver backend should handle (possibly via a
>> common helper library).  The fact that you can't do a DMA transfer
>> to an unaligned buffer is a hardware-specific detail, just as is
>> the fact that you're setting up a DMA buffer in the first place.
> 
> Right.  What I'm trying to say is it's not a NAND problem it's an
> unaligned addresses problem so the solution needs to be easily used
> everywhere.

OK, so fix it in each driver that has this issue.  A lot of drivers are
probably not so performance critical that you can't just always use a
bounce buffer.  A static buffer plus memcpy isn't that burdensome --
it's close to what the drivers for non-DMA hardware do.  For higher
performance peripherals, throw in an if-statement or two.  It doesn't
seem like something that needs a U-Boot-wide change.

In the specific case of NAND, how many NAND drivers use DMA at all?

>> I'm not sure what bootm has to do with nand (and the fact that some
>> ppc is cache coherent actually doesn't matter, since we don't do
>> DMA for NAND), but I was able to bootm from an odd RAM address, and
>> "nand read" to an odd RAM address, on p5020ds.
> 
> On ARM-land we have a lot of problems with unaligned addresses, even
> with cache off.  I went to reproduce the original bootm problem and
> ran into fatload hanging.  tftp didn't fail but bootm hangs.

Maybe you can't take alignment exceptions during bootm?  PPC doesn't
normally take alignment checks, but we would have trouble with this
scenario if it did, since bootm clobbers the exception vectors.

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 21:17           ` Scott Wood
@ 2012-06-25 21:22             ` Tom Rini
  2012-06-25 23:42             ` Marek Vasut
  1 sibling, 0 replies; 36+ messages in thread
From: Tom Rini @ 2012-06-25 21:22 UTC (permalink / raw)
  To: u-boot

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 06/25/2012 02:17 PM, Scott Wood wrote:
> On 06/25/2012 03:48 PM, Tom Rini wrote:
>> -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
>> 
>> On 06/25/2012 01:08 PM, Scott Wood wrote:
>>> On 06/25/2012 01:43 PM, Tom Rini wrote:
>>>> On Mon, Jun 25, 2012 at 11:58:10AM -0500, Scott Wood wrote:
>>>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
>>>>>> This prevents the scenario where data cache is on and the
>>>>>>  device uses DMA to deploy data. In that case, it might
>>>>>> not be possible to flush/invalidate data to RAM properly.
>>>>>> The other option is to use bounce buffer,
>>>>> 
>>>>> Or get cache coherent hardware. :-)
>>>>> 
>>>>>> but that involves a lot of copying and therefore
>>>>>> degrades performance rapidly. Therefore disallow this
>>>>>> possibility of unaligned load address altogether if data
>>>>>> cache is on.
>>>>> 
>>>>> How about use the bounce buffer only if the address is 
>>>>> misaligned?  The corrective action a user has to take is
>>>>> the same as with this patch, except for an additional
>>>>> option of living with the slight performance penalty.  How
>>>>> often does this actually happen?  How much does it actually
>>>>> slow things down compared to the speed of the NAND chip?
>>>> 
>>>> We would need to architect things such that any 'load'
>>>> command would be routed through this logic.
>>> 
>>> It's something the driver backend should handle (possibly via
>>> a common helper library).  The fact that you can't do a DMA
>>> transfer to an unaligned buffer is a hardware-specific detail,
>>> just as is the fact that you're setting up a DMA buffer in the
>>> first place.
>> 
>> Right.  What I'm trying to say is it's not a NAND problem it's
>> an unaligned addresses problem so the solution needs to be easily
>> used everywhere.
> 
> OK, so fix it in each driver that has this issue.  A lot of drivers
> are probably not so performance critical that you can't just always
> use a bounce buffer.  A static buffer plus memcpy isn't that
> burdensome -- it's close to what the drivers for non-DMA hardware
> do.  For higher performance peripherals, throw in an if-statement
> or two.  It doesn't seem like something that needs a U-Boot-wide
> change.
> 
> In the specific case of NAND, how many NAND drivers use DMA at
> all?

Off hand I'm not sure.  I know there've been patches for OMAP parts
before but they had some unaddressed feedback and aren't in mainline.

>>> I'm not sure what bootm has to do with nand (and the fact that
>>> some ppc is cache coherent actually doesn't matter, since we
>>> don't do DMA for NAND), but I was able to bootm from an odd RAM
>>> address, and "nand read" to an odd RAM address, on p5020ds.
>> 
>> On ARM-land we have a lot of problems with unaligned addresses,
>> even with cache off.  I went to reproduce the original bootm
>> problem and ran into fatload hanging.  tftp didn't fail but bootm
>> hangs.
> 
> Maybe you can't take alignment exceptions during bootm?  PPC
> doesn't normally take alignment checks, but we would have trouble
> with this scenario if it did, since bootm clobbers the exception
> vectors.

Could be it.  But again, fatload via mmc also chokes, so there's a few
and quite long-standing problems.  What Marek and I are arguing for I
believe is to say it's not worth the effort to enable this corner
case.  Since it works on other arches, we shouldn't make this a silent
change that slips in, certainly, but at the end of the day it's highly
unlikely someone is using this in the wild and won't be able to work
around it in their next design or field upgrade.  I would hope at least.

- -- 
Tom
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQIcBAEBAgAGBQJP6NaYAAoJENk4IS6UOR1WSogP/RX/VzrN0NzhApkVg38Bal7M
zgbjYS/Mub12oJmzU+iDosVC/HS6RqZb+LoLRlXROLAUka6SxUub/DtJATQ5ivRd
Y+x17Z7rdu4Q3dIcYV27RkJOCiXCRkUVFvulZEfzXlsLpoTC3PbiAA06xDuxHePJ
85liIhxMfTvwUGH0b4umtvy5vrlgop/K6Vdba9SPJwiS/pfW6hIj9ZnnFmJYY6aN
e1IqYjcQtDpxBttdy4PPZr4l0Qx+cYonHs9oFpgnibMsgr7FilqM8fVMq7iHGPYU
acX1YpvOZCWQsv3HcQdq/SrjnRJSDdcySV5q1xEZbpgw0iYbkT9lBb5GX4QcXr43
3zuvLlcvdPz1NzFp4v5gRWUnGaKvCUN5rQHT4UFQDfUErAHuA4q6xjnCe6bYD/EF
kETrAc8pN3sKwJTMJyE1LU2fFfaUirtggT3gWljfgKkfWmbbLnPRTUSkTFWRwu3S
QEcsGD8X+XUsiW9h/mZOlAuspzd6oxOOUCy54NfxkMQZy6YDGFFkPrzme6ztxIlz
O2oSSUSmsWFZbWHVxX/YejZta/RNp+3R9c37J8qFhDku19gMDo+RVn49vKW3dBxp
yb62WkxgO20990kAi7yDfy0XUlJR0WYdiUFiG273TFxNLcanHyct1JELf6zckxju
gBKoz+Ddlqtc7v5AWsuK
=Pyx6
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 7/9] CACHE: net: Test if start address is aligned
  2012-06-25 18:05   ` Joe Hershberger
@ 2012-06-25 23:16     ` Marek Vasut
  0 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25 23:16 UTC (permalink / raw)
  To: u-boot

Dear Joe Hershberger,

> Hi Marek,
> 
> On Sun, Jun 24, 2012 at 7:17 PM, Marek Vasut <marex@denx.de> wrote:
> > This prevents the scenario where data cache is on and the
> > device uses DMA to deploy data. In that case, it might not
> > be possible to flush/invalidate data to RAM properly. The
> > other option is to use bounce buffer, but that involves a
> > lot of copying and therefore degrades performance rapidly.
> > Therefore disallow this possibility of unaligned load address
> > altogether if data cache is on.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> > Cc: Joe Hershberger <joe.hershberger@gmail.com>
> > ---
> 
> Acked-by: Joe Hershberger <joe.hershberger@gmail.com>

NAK, actually ... let's wait until the discussion is settled ;-)

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on
  2012-06-25 18:07   ` Joe Hershberger
@ 2012-06-25 23:16     ` Marek Vasut
  2012-07-06 23:09     ` Marek Vasut
  1 sibling, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25 23:16 UTC (permalink / raw)
  To: u-boot

Dear Joe Hershberger,

> Hi Marex,
> 
> On Sun, Jun 24, 2012 at 7:17 PM, Marek Vasut <marex@denx.de> wrote:
> > The asix driver did not align buffers, therefore it didn't work
> > with data cache enabled. Fix this.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> > Cc: Joe Hershberger <joe.hershberger@gmail.com>
> > ---
> 
> Acked-by: Joe Hershberger <joe.hershberger@gmail.com>

Yes, this one is good, can you pick it up please?

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro
  2012-06-25 21:12   ` Scott Wood
@ 2012-06-25 23:30     ` Marek Vasut
  2012-07-07  3:00       ` Aneesh V
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-25 23:30 UTC (permalink / raw)
  To: u-boot

Dear Scott Wood,

> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> > This macro returns 1 if the argument (address) is aligned, returns
> > zero otherwise. This will be used to test user-supplied address to
> > various commands to prevent user from loading data to/from unaligned
> > address when using caches.
> > 
> > This is made as a macro, because macros are expanded where they are
> > used. Therefore it can be easily instrumented to report position of
> > the fault.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> > Cc: Wolfgang Denk <wd@denx.de>
> > ---
> > 
> >  include/common.h |   18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/include/common.h b/include/common.h
> > index 322569e..17c64b0 100644
> > --- a/include/common.h
> > +++ b/include/common.h
> > @@ -730,6 +730,24 @@ void	invalidate_dcache_range(unsigned long start,
> > unsigned long stop);
> > 
> >  void	invalidate_dcache_all(void);
> >  void	invalidate_icache_all(void);
> > 
> > +/* Test if address is cache-aligned. Returns 0 if it is, 1 otherwise. */
> > +#define cacheline_aligned(addr)						
\
> > +	({								\
> > +	int __ret;							\
> > +	if (!dcache_status()) {						\
> > +		__ret = 1;						\
> > +	} else if ((addr) & (ARCH_DMA_MINALIGN - 1)) {			\
> > +		puts("Align load address to "				\
> > +			__stringify(ARCH_DMA_MINALIGN)			\
> > +			" bytes when using caches!\n");			\
> > +		__ret = 0;						\
> > +	} else {							\
> > +		__ret = 1;						\
> > +	}								\
> > +	__ret;								\
> > +	})
> 
> What if it's a store rather than a load?

Goot point #1

> If this is only supposed to be
> used for loads (because on a store you can flush the cache rather than
> invalidate), it's not labelled that way, the changelog says "to/from
> unaligned address", and the caller might be common code that doesn't
> know which direction the transfer is in.

And we're back at the flush/invalidate thing. This is what I'd really love to 
understand once and for all:

1) We must prevent user from loading to address 0x.......1 (unaligned), I'll get 
back to this later on.

2) If the user had address that starts at aligned location but ends at 
unaligned:

Terminology:
va -- unrelated variable
[****] -- aligned range

2a) LOAD from external source to local address:
The cache must be invalidated over that area. I see the following issue:

[---------buffer---------][va]
[********][********][********]

So consider the scenario where
i) va is written in u-boot
ii) data arrive into the buffer via DMA at the same time
iii) cache is invalidated over whole buffer

This will cause variable "va" to be lost forever. The "va" variable might as 
well be user data, therefore in order to protect those, we should first flush 
the user-supplied area.

2b) STORE from local address to external source:
Basically the inverted problem as in 2a), but if the driver is written properly, 
shouldn't be any issue.

> Besides, it would be awkward
> user interface to allow an address to be used in one direction but not
> the other.

Correct, it should be possible to adjust the message.

> What if the caller wants to try a different strategy if this returns 0,
> rather than print an error?

Good point.

> Why should the success of a command depend on whether caches are
> enabled?

On less capable architectures, flushing caches over unaligned area causes real 
mayhem (and DMA depends on this) and it's really tough to debug. If the caches 
are off, it's all good.

Good point is, that this code should be enabled on per-CPU-model basis probably? 
Or entirely configurable in the include/configs/....

> If we're going to forbid unaligned addresses in certain
> contexts, shouldn't it always be forbidden to ensure consistent user
> experience?

This is a good argument ... I wonder, let's hear others opinions.

> Or if we're going to be picky about when we reject it, why
> don't we care whether the device in question does DMA, and whether that
> DMA is coherent?

Correct, good point. But then this is something that should be added into the 
upcomming uboot driver model!

> -Scott

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 16:58   ` Scott Wood
  2012-06-25 18:43     ` Tom Rini
@ 2012-06-25 23:37     ` Marek Vasut
  2012-06-25 23:57       ` Scott Wood
  1 sibling, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-25 23:37 UTC (permalink / raw)
  To: u-boot

Dear Scott Wood,

> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> > This prevents the scenario where data cache is on and the
> > device uses DMA to deploy data. In that case, it might not
> > be possible to flush/invalidate data to RAM properly. The
> > other option is to use bounce buffer,
> 
> Or get cache coherent hardware. :-)

Oh ... you mean powerpc? Or rather something like this 
http://cache.freescale.com/files/32bit/doc/fact_sheet/QORIQLS2FAMILYFS.pdf ? :-D

> > but that involves a lot of copying and therefore degrades performance
> > rapidly. Therefore disallow this possibility of unaligned load
> > address altogether if data cache is on.
> 
> How about use the bounce buffer only if the address is misaligned?

Not happening, bounce buffer is bullshit, especially if we can prevent it by 
teaching user not to do retarded things.

It's like driving a car in the wrong lane. Sure, you can do it, but it'll 
eventually have some consequences. And using a bounce buffer is like driving a 
tank in the wrong lane ...

> The
> corrective action a user has to take is the same as with this patch,
> except for an additional option of living with the slight performance
> penalty.

Slight is very weak word here.

> How often does this actually happen?  How much does it
> actually slow things down compared to the speed of the NAND chip?

If the user is dumb, always. But if you tell the user how to milk the most of 
the hardware, he'll be happier.

> I'm hesitant to break something -- even if it's odd (literally in this
> case) -- that currently works on most hardware, just because one or two
> drivers can't handle it.  It feels kind of like changing the read() and
> write() system calls to require cacheline alignment. :-P

That's actually almost right, we're doing a bootloader here, it might have 
limitations. We're not writing yet another operating system with no bounds on 
possibilities!

> > Signed-off-by: Marek Vasut <marex@denx.de>
> > Cc: Scott Wood <scottwood@freescale.com>
> > ---
> > 
> >  common/cmd_nand.c |    9 +++++++++
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/common/cmd_nand.c b/common/cmd_nand.c
> > index a91ccf4..122a91c 100644
> > --- a/common/cmd_nand.c
> > +++ b/common/cmd_nand.c
> > @@ -609,6 +609,8 @@ int do_nand(cmd_tbl_t * cmdtp, int flag, int argc,
> > char * const argv[])
> > 
> >  			goto usage;
> >  		
> >  		addr = (ulong)simple_strtoul(argv[2], NULL, 16);
> > 
> > +		if (!cacheline_aligned(addr))
> > +			return 1;
> 
> There's no way you can just return like this without printing an error
> that lets the user know that the operation wasn't performed, and why.
> 
> -Scott

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 18:43     ` Tom Rini
  2012-06-25 20:08       ` Scott Wood
@ 2012-06-25 23:38       ` Marek Vasut
  1 sibling, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-25 23:38 UTC (permalink / raw)
  To: u-boot

Dear Tom Rini,

> On Mon, Jun 25, 2012 at 11:58:10AM -0500, Scott Wood wrote:
> > On 06/24/2012 07:17 PM, Marek Vasut wrote:
> > > This prevents the scenario where data cache is on and the
> > > device uses DMA to deploy data. In that case, it might not
> > > be possible to flush/invalidate data to RAM properly. The
> > > other option is to use bounce buffer,
> > 
> > Or get cache coherent hardware. :-)
> > 
> > > but that involves a lot of copying and therefore degrades performance
> > > rapidly. Therefore disallow this possibility of unaligned load
> > > address altogether if data cache is on.
> > 
> > How about use the bounce buffer only if the address is misaligned?  The
> > corrective action a user has to take is the same as with this patch,
> > except for an additional option of living with the slight performance
> > penalty.  How often does this actually happen?  How much does it
> > actually slow things down compared to the speed of the NAND chip?
> 
> We would need to architect things such that any 'load' command would be
> routed through this logic.  A solvable problem for sure, but a little
> bit larger than just for NAND :)

It's not only NAND, it's the whole user interface that's broken right now.

> > I'm hesitant to break something -- even if it's odd (literally in this
> > case) -- that currently works on most hardware, just because one or two
> > drivers can't handle it.  It feels kind of like changing the read() and
> > write() system calls to require cacheline alignment. :-P
> 
> I don't want to get into an ARM vs PowerPC argument.

ARM is better anyway :-D

> I think the best
> answer is that I'm not sure having things unaligned works totally right
> today as I did a silly test of loading a uImage to 0x82000001 and bootm
> hung inside of U-Boot not long ago.  Can you try that on some cache
> coherent hardware and see if that works?

+1

[...]

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 21:17           ` Scott Wood
  2012-06-25 21:22             ` Tom Rini
@ 2012-06-25 23:42             ` Marek Vasut
  2012-06-26  0:37               ` Scott Wood
  1 sibling, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-25 23:42 UTC (permalink / raw)
  To: u-boot

Dear Scott Wood,

> On 06/25/2012 03:48 PM, Tom Rini wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> > 
> > On 06/25/2012 01:08 PM, Scott Wood wrote:
> >> On 06/25/2012 01:43 PM, Tom Rini wrote:
> >>> On Mon, Jun 25, 2012 at 11:58:10AM -0500, Scott Wood wrote:
> >>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> >>>>> This prevents the scenario where data cache is on and the
> >>>>> device uses DMA to deploy data. In that case, it might not be
> >>>>> possible to flush/invalidate data to RAM properly. The other
> >>>>> option is to use bounce buffer,
> >>>> 
> >>>> Or get cache coherent hardware. :-)
> >>>> 
> >>>>> but that involves a lot of copying and therefore degrades
> >>>>> performance rapidly. Therefore disallow this possibility of
> >>>>> unaligned load address altogether if data cache is on.
> >>>> 
> >>>> How about use the bounce buffer only if the address is
> >>>> misaligned?  The corrective action a user has to take is the
> >>>> same as with this patch, except for an additional option of
> >>>> living with the slight performance penalty.  How often does
> >>>> this actually happen?  How much does it actually slow things
> >>>> down compared to the speed of the NAND chip?
> >>> 
> >>> We would need to architect things such that any 'load' command
> >>> would be routed through this logic.
> >> 
> >> It's something the driver backend should handle (possibly via a
> >> common helper library).  The fact that you can't do a DMA transfer
> >> to an unaligned buffer is a hardware-specific detail, just as is
> >> the fact that you're setting up a DMA buffer in the first place.
> > 
> > Right.  What I'm trying to say is it's not a NAND problem it's an
> > unaligned addresses problem so the solution needs to be easily used
> > everywhere.
> 
> OK, so fix it in each driver that has this issue.  A lot of drivers are
> probably not so performance critical that you can't just always use a
> bounce buffer.  A static buffer plus memcpy isn't that burdensome --
> it's close to what the drivers for non-DMA hardware do.  For higher
> performance peripherals, throw in an if-statement or two.  It doesn't
> seem like something that needs a U-Boot-wide change.

This is flat bull. I don't want bounce buffers growing all around uboot, see my 
previous email. I'm 120% firm in that.

And btw it's not about bounce buffers, it's also about other code (like FS code) 
which does unaligned accesses and we're fixing it.

> 
> In the specific case of NAND, how many NAND drivers use DMA at all?

Many do, it's not only nand, it's all over the place.

SPI, NAND, MMC etc.

> >> I'm not sure what bootm has to do with nand (and the fact that some
> >> ppc is cache coherent actually doesn't matter, since we don't do
> >> DMA for NAND), but I was able to bootm from an odd RAM address, and
> >> "nand read" to an odd RAM address, on p5020ds.
> > 
> > On ARM-land we have a lot of problems with unaligned addresses, even
> > with cache off.  I went to reproduce the original bootm problem and
> > ran into fatload hanging.  tftp didn't fail but bootm hangs.
> 
> Maybe you can't take alignment exceptions during bootm?  PPC doesn't
> normally take alignment checks, but we would have trouble with this
> scenario if it did, since bootm clobbers the exception vectors.
> 
> -Scott

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 23:37     ` Marek Vasut
@ 2012-06-25 23:57       ` Scott Wood
  2012-06-26  1:33         ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Scott Wood @ 2012-06-25 23:57 UTC (permalink / raw)
  To: u-boot

On 06/25/2012 06:37 PM, Marek Vasut wrote:
> Dear Scott Wood,
> 
>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
>>> This prevents the scenario where data cache is on and the
>>> device uses DMA to deploy data. In that case, it might not
>>> be possible to flush/invalidate data to RAM properly. The
>>> other option is to use bounce buffer,
>>
>> Or get cache coherent hardware. :-)
> 
> Oh ... you mean powerpc? Or rather something like this 
> http://cache.freescale.com/files/32bit/doc/fact_sheet/QORIQLS2FAMILYFS.pdf ? :-D

The word "coherent/coherency" appears 5 times in that document by my
count. :-)

I hope that applies to DMA, not just core-to-core.

>>> but that involves a lot of copying and therefore degrades performance
>>> rapidly. Therefore disallow this possibility of unaligned load
>>> address altogether if data cache is on.
>>
>> How about use the bounce buffer only if the address is misaligned?
> 
> Not happening, bounce buffer is bullshit,

Hacking up the common frontend with a new limitation because you can't
be bothered to fix your drivers is bullshit.

> It's like driving a car in the wrong lane. Sure, you can do it, but it'll 
> eventually have some consequences. And using a bounce buffer is like driving a 
> tank in the wrong lane ...

Using a bounce buffer is like parking your car before going into the
building, rather than insisting the building's hallways be paved.

>> The
>> corrective action a user has to take is the same as with this patch,
>> except for an additional option of living with the slight performance
>> penalty.
> 
> Slight is very weak word here.

Prove me wrong with benchmarks.

>> How often does this actually happen?  How much does it
>> actually slow things down compared to the speed of the NAND chip?
> 
> If the user is dumb, always. But if you tell the user how to milk the most of 
> the hardware, he'll be happier.

So, if you use bounce buffers conditionally (based on whether the
address is misaligned), there's no impact except to "dumb" users, and
for those users they would merely get a performance degradation rather
than breakage.  How is this "bullshit"?

>> I'm hesitant to break something -- even if it's odd (literally in this
>> case) -- that currently works on most hardware, just because one or two
>> drivers can't handle it.  It feels kind of like changing the read() and
>> write() system calls to require cacheline alignment. :-P
> 
> That's actually almost right, we're doing a bootloader here, it might have 
> limitations. We're not writing yet another operating system with no bounds on 
> possibilities!

We also don't need to bend over backwards to squeeze every last cycle
out of the boot process, at the expense of a stable user interface (not
to mention requiring the user to know the system's cache line size).

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 23:42             ` Marek Vasut
@ 2012-06-26  0:37               ` Scott Wood
  2012-06-26  1:16                 ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Scott Wood @ 2012-06-26  0:37 UTC (permalink / raw)
  To: u-boot

On 06/25/2012 06:42 PM, Marek Vasut wrote:
> Dear Scott Wood,
> 
>> On 06/25/2012 03:48 PM, Tom Rini wrote:
>>> Right.  What I'm trying to say is it's not a NAND problem it's an
>>> unaligned addresses problem so the solution needs to be easily used
>>> everywhere.
>>
>> OK, so fix it in each driver that has this issue.  A lot of drivers are
>> probably not so performance critical that you can't just always use a
>> bounce buffer.  A static buffer plus memcpy isn't that burdensome --
>> it's close to what the drivers for non-DMA hardware do.  For higher
>> performance peripherals, throw in an if-statement or two.  It doesn't
>> seem like something that needs a U-Boot-wide change.
> 
> This is flat bull. I don't want bounce buffers growing all around uboot, see my 
> previous email. I'm 120% firm in that.

I'm 150% firm that you're going to need a better justification to change
the user interface for everybody.

What's next, restricting "cp" (or even "cp.b") in case someone wants to
use a DMA engine to copy the bits?

Note that in the case of "nand read.oob", depending on NAND page size
and platform, there's a good chance that you're imposing an alignment
restriction that is larger than the data being transferred even if the
user asks to read the entire OOB.

What about "nand write.yaffs2" or multi-page "nand read.raw", which deal
with arrays of interleaved main+spare?  With a small page NAND chip,
you'll need cache lines that are 16 bytes or smaller to avoid unaligned
transactions -- and those will bypass your front-end check (unless the
user is so "stupid" as to want to modify the data after a raw-read, and
do a raw-write of a particular page).

> And btw it's not about bounce buffers, it's also about other code (like FS code) 
> which does unaligned accesses and we're fixing it.

Fixing the alignment of U-Boot-generated buffers is good.  That doesn't
affect the user interface.  This is different.

>> In the specific case of NAND, how many NAND drivers use DMA at all?
> 
> Many do, 

How many?  Specifically, how many that have alignment restrictions, that
would need to be fixed?

> it's not only nand, it's all over the place.

This patch is about NAND.

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-26  0:37               ` Scott Wood
@ 2012-06-26  1:16                 ` Marek Vasut
  2012-06-26 19:38                   ` Scott Wood
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-26  1:16 UTC (permalink / raw)
  To: u-boot

Dear Scott Wood,

> On 06/25/2012 06:42 PM, Marek Vasut wrote:
> > Dear Scott Wood,
> > 
> >> On 06/25/2012 03:48 PM, Tom Rini wrote:
> >>> Right.  What I'm trying to say is it's not a NAND problem it's an
> >>> unaligned addresses problem so the solution needs to be easily used
> >>> everywhere.
> >> 
> >> OK, so fix it in each driver that has this issue.  A lot of drivers are
> >> probably not so performance critical that you can't just always use a
> >> bounce buffer.  A static buffer plus memcpy isn't that burdensome --
> >> it's close to what the drivers for non-DMA hardware do.  For higher
> >> performance peripherals, throw in an if-statement or two.  It doesn't
> >> seem like something that needs a U-Boot-wide change.
> > 
> > This is flat bull. I don't want bounce buffers growing all around uboot,
> > see my previous email. I'm 120% firm in that.
> 
> I'm 150% firm that you're going to need a better justification to change
> the user interface for everybody.

I'm trying to make it user-proof. Honestly, loading to some bullshit unaligned 
address is a corner case, so it won't affect many people. And that minority 
who'll be affected will adjust their stuff by a few bytes, no problem there.

> What's next, restricting "cp" (or even "cp.b") in case someone wants to
> use a DMA engine to copy the bits?

Ad absurdum indeed, but it's partly valid point. You'd need to restrict it 
somehow if that'd be the case, therefore I'll add it to the DM. Drivers should 
be able to specify their requirements.

> Note that in the case of "nand read.oob", depending on NAND page size
> and platform, there's a good chance that you're imposing an alignment
> restriction that is larger than the data being transferred even if the
> user asks to read the entire OOB.

I don't think I completely follow here.

> What about "nand write.yaffs2" or multi-page "nand read.raw", which deal
> with arrays of interleaved main+spare?  With a small page NAND chip,
> you'll need cache lines that are 16 bytes or smaller to avoid unaligned
> transactions -- and those will bypass your front-end check (unless the
> user is so "stupid" as to want to modify the data after a raw-read, and
> do a raw-write of a particular page).

Ok, I think I'm very stupid now, probably since I have high fever. I'll read 
this after I sleep. Sorry Scott, I'm sure you're rolling out a valid point, it's 
just that I'm incapable of understanding it right now.

> > And btw it's not about bounce buffers, it's also about other code (like
> > FS code) which does unaligned accesses and we're fixing it.
> 
> Fixing the alignment of U-Boot-generated buffers is good.  That doesn't
> affect the user interface.  This is different.

And it's the easy part.

> >> In the specific case of NAND, how many NAND drivers use DMA at all?
> > 
> > Many do,
> 
> How many?  Specifically, how many that have alignment restrictions, that
> would need to be fixed?

At least all on the ARM architecture. And more will come, since ARM is on the 
rise.

> > it's not only nand, it's all over the place.
> 
> This patch is about NAND.

Check the whole patchset ... and that still only covers a small part of it all.

> -Scott

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25 23:57       ` Scott Wood
@ 2012-06-26  1:33         ` Marek Vasut
  2012-06-26 19:25           ` Scott Wood
  0 siblings, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-06-26  1:33 UTC (permalink / raw)
  To: u-boot

Dear Scott Wood,

> On 06/25/2012 06:37 PM, Marek Vasut wrote:
> > Dear Scott Wood,
> > 
> >> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> >>> This prevents the scenario where data cache is on and the
> >>> device uses DMA to deploy data. In that case, it might not
> >>> be possible to flush/invalidate data to RAM properly. The
> >>> other option is to use bounce buffer,
> >> 
> >> Or get cache coherent hardware. :-)
> > 
> > Oh ... you mean powerpc? Or rather something like this
> > http://cache.freescale.com/files/32bit/doc/fact_sheet/QORIQLS2FAMILYFS.pd
> > f ? :-D
> 
> The word "coherent/coherency" appears 5 times in that document by my
> count. :-)
> 
> I hope that applies to DMA, not just core-to-core.
> 
> >>> but that involves a lot of copying and therefore degrades performance
> >>> rapidly. Therefore disallow this possibility of unaligned load
> >>> address altogether if data cache is on.
> >> 
> >> How about use the bounce buffer only if the address is misaligned?
> > 
> > Not happening, bounce buffer is bullshit,
> 
> Hacking up the common frontend with a new limitation because you can't
> be bothered to fix your drivers is bullshit.

The drivers are not broken, they have hardware limitations. And checking for 
those has to be done as early as possible. And it's not a new common frontend!

> > It's like driving a car in the wrong lane. Sure, you can do it, but it'll
> > eventually have some consequences. And using a bounce buffer is like
> > driving a tank in the wrong lane ...
> 
> Using a bounce buffer is like parking your car before going into the
> building, rather than insisting the building's hallways be paved.

The other is obviously faster, more comfortable and lets you carry more stuff at 
once. And if you drive a truck, you can dump a lot of payload instead of 
carrying it back and forth from the building. That's why there's a special 
garage for trucks possibly with cargo elevators etc.

> >> The
> >> corrective action a user has to take is the same as with this patch,
> >> except for an additional option of living with the slight performance
> >> penalty.
> > 
> > Slight is very weak word here.
> 
> Prove me wrong with benchmarks.

Well, copying data back and forth is tremendous overhead. You don't need a 
benchmark to calculate something like this:

133MHz SDRAM (pumped) gives you what ... 133 Mb/s throughput
(now if it's DDR, dual/quad pumped, that doesn't give you any more advantage 
since you have to: send address, read the data, send address, write the data ... 
this is expensive ... without data cache on, even more so)

Now consider you do it via really dump memcpy, what happens:
1) You need to read the data into register
1a) Send address
1b) Read the data into register
2) You need to write the data to a new location
2a) Send address
2b) Write the data into the memory

In the meantime, you get some refresh cycles etc. Now, if you take read and 
write in 1 time unit and "send address" in 0.5 time unit (this gives total 3 
time units per one loop) and consider you're not doing sustained read/write, you 
should see you'll be able to copy at speed of about 133/3 ~= 40Mb/s

If you want to load 3MB kernel at 40Mb/s onto an unaligned address via DMA, the 
DMA will deploy it via sustained write, that'll be at 10MB/s, therefore in 
300ms. But the subsequent copy will take another 600ms.

And now, I need someone to recalculate it. Also, read up here, it is _VERY_ good 
and it is certainly more accurate than my previous delirious attempt:

http://www.akkadia.org/drepper/cpumemory.pdf

> >> How often does this actually happen?  How much does it
> >> actually slow things down compared to the speed of the NAND chip?
> > 
> > If the user is dumb, always. But if you tell the user how to milk the
> > most of the hardware, he'll be happier.
> 
> So, if you use bounce buffers conditionally (based on whether the
> address is misaligned), there's no impact except to "dumb" users, and
> for those users they would merely get a performance degradation rather
> than breakage.  How is this "bullshit"?

Correct, but users will complain if they get a subpar performance.

> >> I'm hesitant to break something -- even if it's odd (literally in this
> >> case) -- that currently works on most hardware, just because one or two
> >> drivers can't handle it.  It feels kind of like changing the read() and
> >> write() system calls to require cacheline alignment. :-P
> > 
> > That's actually almost right, we're doing a bootloader here, it might
> > have limitations. We're not writing yet another operating system with no
> > bounds on possibilities!
> 
> We also don't need to bend over backwards to squeeze every last cycle
> out of the boot process, at the expense of a stable user interface (not
> to mention requiring the user to know the system's cache line size).

But that's reported in my patch ;-)

And yes, we want to make the boot process as blazing fast as possible. Imagine 
you fire a rocket into the deep space and it gets broken and needs reboot, will 
you enjoy the waiting ? ;-)

> -Scott

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-26  1:33         ` Marek Vasut
@ 2012-06-26 19:25           ` Scott Wood
  2012-06-26 20:39             ` Marek Vasut
  0 siblings, 1 reply; 36+ messages in thread
From: Scott Wood @ 2012-06-26 19:25 UTC (permalink / raw)
  To: u-boot

On 06/25/2012 08:33 PM, Marek Vasut wrote:
> Dear Scott Wood,
> 
>> On 06/25/2012 06:37 PM, Marek Vasut wrote:
>>> Dear Scott Wood,
>>>
>>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
>>>>> but that involves a lot of copying and therefore degrades performance
>>>>> rapidly. Therefore disallow this possibility of unaligned load
>>>>> address altogether if data cache is on.
>>>>
>>>> How about use the bounce buffer only if the address is misaligned?
>>>
>>> Not happening, bounce buffer is bullshit,
>>
>> Hacking up the common frontend with a new limitation because you can't
>> be bothered to fix your drivers is bullshit.
> 
> The drivers are not broken, they have hardware limitations. 

They're broken because they ignore those limitations.

> And checking for 
> those has to be done as early as possible.

Why?

> And it's not a new common frontend!

No, it's a compatibility-breaking change to the existing common frontend.

>>> It's like driving a car in the wrong lane. Sure, you can do it, but it'll
>>> eventually have some consequences. And using a bounce buffer is like
>>> driving a tank in the wrong lane ...
>>
>> Using a bounce buffer is like parking your car before going into the
>> building, rather than insisting the building's hallways be paved.
> 
> The other is obviously faster, more comfortable and lets you carry more stuff at 
> once.

Then you end up needing buildings to be many times as large to give
every cubicle an adjacent parking spot, maneuvering room, etc.  You'll
be breathing fumes all day, and it'll be a lot less comfortable to get
even across the hallway without using a car, etc.  Communication between
coworkers would be limited to horns and obscene gestures. :-)

> And if you drive a truck, you can dump a lot of payload instead of 
> carrying it back and forth from the building. That's why there's a special 
> garage for trucks possibly with cargo elevators etc.

Yes, it's called targeted optimization rather than premature optimization.

>>>> The
>>>> corrective action a user has to take is the same as with this patch,
>>>> except for an additional option of living with the slight performance
>>>> penalty.
>>>
>>> Slight is very weak word here.
>>
>> Prove me wrong with benchmarks.
> 
> Well, copying data back and forth is tremendous overhead. You don't need a 
> benchmark to calculate something like this:
> 
> 133MHz SDRAM (pumped) gives you what ... 133 Mb/s throughput

You're saying you get only a little more bandwidth from memory than
you'd get from a 100 Mb/s ethernet port?  Come on.  Data buses are not
one bit wide.

And how fast can you pull data out of a NAND chip, even with DMA?

> (now if it's DDR, dual/quad pumped, that doesn't give you any more advantage 

So such things were implemented for fun?

> since you have to: send address, read the data, send address, write the data ... 

What about bursts?  I'm pretty sure you don't have to send the address
separately for every single byte.

> this is expensive ... without data cache on, even more so)

Why do we care about "without data cache"?  You don't need the bounce
buffer in that case.

> Now consider you do it via really dump memcpy, what happens:

It looks like ARM U-Boot has an optimized memcpy.

> 1) You need to read the data into register
> 1a) Send address
> 1b) Read the data into register
> 2) You need to write the data to a new location
> 2a) Send address
> 2b) Write the data into the memory
> 
> In the meantime, you get some refresh cycles etc. Now, if you take read and 
> write in 1 time unit and "send address" in 0.5 time unit (this gives total 3 
> time units per one loop) and consider you're not doing sustained read/write, you 
> should see you'll be able to copy at speed of about 133/3 ~= 40Mb/s
> 
> If you want to load 3MB kernel at 40Mb/s onto an unaligned address via DMA, the 
> DMA will deploy it via sustained write, that'll be at 10MB/s, therefore in 
> 300ms. But the subsequent copy will take another 600ms.

On a p5020ds, using NAND hardware that doesn't do DMA at all, I'm able
to load a 3MiB image from NAND in around 300-400 ms.  This is with using
memcpy_fromio() on an uncached hardware buffer.

Again, I'm not saying that bounce buffers are always negligible overhead
-- just that I doubt NAND is fast enough that it makes a huge difference
in this specific case.

>>>> How often does this actually happen?  How much does it
>>>> actually slow things down compared to the speed of the NAND chip?
>>>
>>> If the user is dumb, always. But if you tell the user how to milk the
>>> most of the hardware, he'll be happier.
>>
>> So, if you use bounce buffers conditionally (based on whether the
>> address is misaligned), there's no impact except to "dumb" users, and
>> for those users they would merely get a performance degradation rather
>> than breakage.  How is this "bullshit"?
> 
> Correct, but users will complain if they get a subpar performance.

If you expend the minimal effort required to make the bounce buffer
usage conditional on the address actually being misaligned, the only
users that will see subpar performance are those who would see breakage
with your approach.  Users will complain if they see breakage even more
than when the see subpar performance.

If the issue is educating the user to avoid the performance hit
(regardless of magnitude), and you care enough, have the driver print a
warning (not error) message the first time it needs to use a bounce buffer.

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-26  1:16                 ` Marek Vasut
@ 2012-06-26 19:38                   ` Scott Wood
  0 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2012-06-26 19:38 UTC (permalink / raw)
  To: u-boot

On 06/25/2012 08:16 PM, Marek Vasut wrote:
> Dear Scott Wood,
>> Note that in the case of "nand read.oob", depending on NAND page size
>> and platform, there's a good chance that you're imposing an alignment
>> restriction that is larger than the data being transferred even if the
>> user asks to read the entire OOB.
> 
> I don't think I completely follow here.

Why should I need to have 64-byte alignment for transfering a 16-byte OOB?

>> What about "nand write.yaffs2" or multi-page "nand read.raw", which deal
>> with arrays of interleaved main+spare?  With a small page NAND chip,
>> you'll need cache lines that are 16 bytes or smaller to avoid unaligned
>> transactions -- and those will bypass your front-end check (unless the
>> user is so "stupid" as to want to modify the data after a raw-read, and
>> do a raw-write of a particular page).
> 
> Ok, I think I'm very stupid now, probably since I have high fever. I'll read 
> this after I sleep. Sorry Scott, I'm sure you're rolling out a valid point, it's 
> just that I'm incapable of understanding it right now.

Probably best to wait until you're feeling better for the rest of it,
too.  Hope you get well soon.

>>>> In the specific case of NAND, how many NAND drivers use DMA at all?
>>>
>>> Many do,
>>
>> How many?  Specifically, how many that have alignment restrictions, that
>> would need to be fixed?
> 
> At least all on the ARM architecture. And more will come, since ARM is on the 
> rise.

atmel: no, doesn't use DMA
davinci: no, doesn't use DMA
kb9202: no, doesn't use DMA
kirkwood: no, doesn't use DMA
mxc: I don't think this uses DMA, but this driver scares me. :-)
mxs: Even scarier.  Looks like this one does use DMA.
omap_gpmc: no, doesn't use DMA
s3c64xx: no, doesn't use DMA
spr: no, doesn't use DMA
s3c2410: no, doesn't use DMA

If this is about not wanting to touch the mxs_nand driver again, I
sympathize. :-)

>>> it's not only nand, it's all over the place.
>>
>> This patch is about NAND.
> 
> Check the whole patchset ... and that still only covers a small part of it all.

Right, I think it's wrong elsewhere too when it's user interface, but
I'll let the relevant custodians argue those cases.

-Scott

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-26 19:25           ` Scott Wood
@ 2012-06-26 20:39             ` Marek Vasut
  0 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-06-26 20:39 UTC (permalink / raw)
  To: u-boot

Dear Scott Wood,

> On 06/25/2012 08:33 PM, Marek Vasut wrote:
> > Dear Scott Wood,
> > 
> >> On 06/25/2012 06:37 PM, Marek Vasut wrote:
> >>> Dear Scott Wood,
> >>> 
> >>>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
> >>>>> but that involves a lot of copying and therefore degrades performance
> >>>>> rapidly. Therefore disallow this possibility of unaligned load
> >>>>> address altogether if data cache is on.
> >>>> 
> >>>> How about use the bounce buffer only if the address is misaligned?
> >>> 
> >>> Not happening, bounce buffer is bullshit,
> >> 
> >> Hacking up the common frontend with a new limitation because you can't
> >> be bothered to fix your drivers is bullshit.
> > 
> > The drivers are not broken, they have hardware limitations.
> 
> They're broken because they ignore those limitations.
> 
> > And checking for
> > those has to be done as early as possible.
> 
> Why?

Why keep an overhead.

> > And it's not a new common frontend!
> 
> No, it's a compatibility-breaking change to the existing common frontend.

Well, those are corner cases, it's not like the people will start hitting it en-
masse.

I agree it should be somehow platform or even CPU specific.

> >>> It's like driving a car in the wrong lane. Sure, you can do it, but
> >>> it'll eventually have some consequences. And using a bounce buffer is
> >>> like driving a tank in the wrong lane ...
> >> 
> >> Using a bounce buffer is like parking your car before going into the
> >> building, rather than insisting the building's hallways be paved.
> > 
> > The other is obviously faster, more comfortable and lets you carry more
> > stuff at once.
> 
> Then you end up needing buildings to be many times as large to give
> every cubicle an adjacent parking spot, maneuvering room, etc.  You'll
> be breathing fumes all day, and it'll be a lot less comfortable to get
> even across the hallway without using a car, etc.  Communication between
> coworkers would be limited to horns and obscene gestures. :-)

Ok, this has gone waaay out of hand here :-)

> > And if you drive a truck, you can dump a lot of payload instead of
> > carrying it back and forth from the building. That's why there's a
> > special garage for trucks possibly with cargo elevators etc.
> 
> Yes, it's called targeted optimization rather than premature optimization.
> 
> >>>> The
> >>>> corrective action a user has to take is the same as with this patch,
> >>>> except for an additional option of living with the slight performance
> >>>> penalty.
> >>> 
> >>> Slight is very weak word here.
> >> 
> >> Prove me wrong with benchmarks.
> > 
> > Well, copying data back and forth is tremendous overhead. You don't need
> > a benchmark to calculate something like this:
> > 
> > 133MHz SDRAM (pumped) gives you what ... 133 Mb/s throughput
> 
> You're saying you get only a little more bandwidth from memory than
> you'd get from a 100 Mb/s ethernet port?  Come on.  Data buses are not
> one bit wide.

Good point, it was too late and I forgot to count that in.

> And how fast can you pull data out of a NAND chip, even with DMA?
> 
> > (now if it's DDR, dual/quad pumped, that doesn't give you any more
> > advantage
> 
> So such things were implemented for fun?
> 
> > since you have to: send address, read the data, send address, write the
> > data ...
> 
> What about bursts?  I'm pretty sure you don't have to send the address
> separately for every single byte.

If you do memcpy? You only have registers, sure, you can optimize it, but on 
intel for example, you don't have many of those.

> > this is expensive ... without data cache on, even more so)
> 
> Why do we care about "without data cache"?  You don't need the bounce
> buffer in that case.

Correct, it's expensive in both cases though.

> > Now consider you do it via really dump memcpy, what happens:
> It looks like ARM U-Boot has an optimized memcpy.
> 
> > 1) You need to read the data into register
> > 1a) Send address
> > 1b) Read the data into register
> > 2) You need to write the data to a new location
> > 2a) Send address
> > 2b) Write the data into the memory
> > 
> > In the meantime, you get some refresh cycles etc. Now, if you take read
> > and write in 1 time unit and "send address" in 0.5 time unit (this gives
> > total 3 time units per one loop) and consider you're not doing sustained
> > read/write, you should see you'll be able to copy at speed of about
> > 133/3 ~= 40Mb/s
> > 
> > If you want to load 3MB kernel at 40Mb/s onto an unaligned address via
> > DMA, the DMA will deploy it via sustained write, that'll be at 10MB/s,
> > therefore in 300ms. But the subsequent copy will take another 600ms.
> 
> On a p5020ds, using NAND hardware that doesn't do DMA at all, I'm able
> to load a 3MiB image from NAND in around 300-400 ms.  This is with using
> memcpy_fromio() on an uncached hardware buffer.

blazing almost half a second ... but everyone these days wants a faster boot, 
without memcpy, it can go down to 100ms or even less! And this kind of 
limitation is not something that'd inconvenience anyone.

> Again, I'm not saying that bounce buffers are always negligible overhead
> -- just that I doubt NAND is fast enough that it makes a huge difference
> in this specific case.

It does make a difference. Correct, thinking about it -- implementing a generic 
bounce buffer for cache-impotent hardware might be a better way to go.

> >>>> How often does this actually happen?  How much does it
> >>>> actually slow things down compared to the speed of the NAND chip?
> >>> 
> >>> If the user is dumb, always. But if you tell the user how to milk the
> >>> most of the hardware, he'll be happier.
> >> 
> >> So, if you use bounce buffers conditionally (based on whether the
> >> address is misaligned), there's no impact except to "dumb" users, and
> >> for those users they would merely get a performance degradation rather
> >> than breakage.  How is this "bullshit"?
> > 
> > Correct, but users will complain if they get a subpar performance.
> 
> If you expend the minimal effort required to make the bounce buffer
> usage conditional on the address actually being misaligned, the only
> users that will see subpar performance are those who would see breakage
> with your approach.  Users will complain if they see breakage even more
> than when the see subpar performance.
> 
> If the issue is educating the user to avoid the performance hit
> (regardless of magnitude), and you care enough, have the driver print a
> warning (not error) message the first time it needs to use a bounce buffer.

Ok, on a second thought, you're right. Let's do it the same way we did it with 
mmc.

> -Scott

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on
  2012-06-25 18:07   ` Joe Hershberger
  2012-06-25 23:16     ` Marek Vasut
@ 2012-07-06 23:09     ` Marek Vasut
  2012-07-06 23:16       ` Marek Vasut
  1 sibling, 1 reply; 36+ messages in thread
From: Marek Vasut @ 2012-07-06 23:09 UTC (permalink / raw)
  To: u-boot

Dear Joe Hershberger,

> Hi Marex,
> 
> On Sun, Jun 24, 2012 at 7:17 PM, Marek Vasut <marex@denx.de> wrote:
> > The asix driver did not align buffers, therefore it didn't work
> > with data cache enabled. Fix this.
> > 
> > Signed-off-by: Marek Vasut <marex@denx.de>
> > Cc: Joe Hershberger <joe.hershberger@gmail.com>
> > ---
> 
> Acked-by: Joe Hershberger <joe.hershberger@gmail.com>

Joe, don't apply this one, I'll send updated one.

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on
  2012-07-06 23:09     ` Marek Vasut
@ 2012-07-06 23:16       ` Marek Vasut
  0 siblings, 0 replies; 36+ messages in thread
From: Marek Vasut @ 2012-07-06 23:16 UTC (permalink / raw)
  To: u-boot

Dear Joe Hershberger,

> > Hi Marex,
> > 
> > On Sun, Jun 24, 2012 at 7:17 PM, Marek Vasut <marex@denx.de> wrote:
> > > The asix driver did not align buffers, therefore it didn't work
> > > with data cache enabled. Fix this.
> > > 
> > > Signed-off-by: Marek Vasut <marex@denx.de>
> > > Cc: Joe Hershberger <joe.hershberger@gmail.com>
> > > ---
> > 
> > Acked-by: Joe Hershberger <joe.hershberger@gmail.com>
> 
> Joe, don't apply this one, I'll send updated one.

Stupid me, this can be safely applied.

> Best regards,
> Marek Vasut

Best regards,
Marek Vasut

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro
  2012-06-25 23:30     ` Marek Vasut
@ 2012-07-07  3:00       ` Aneesh V
  0 siblings, 0 replies; 36+ messages in thread
From: Aneesh V @ 2012-07-07  3:00 UTC (permalink / raw)
  To: u-boot

Hi Marek,

On 06/25/2012 04:30 PM, Marek Vasut wrote:
> Dear Scott Wood,
>
>> On 06/24/2012 07:17 PM, Marek Vasut wrote:
>>> This macro returns 1 if the argument (address) is aligned, returns
>>> zero otherwise. This will be used to test user-supplied address to
>>> various commands to prevent user from loading data to/from unaligned
>>> address when using caches.
>>>
>>> This is made as a macro, because macros are expanded where they are
>>> used. Therefore it can be easily instrumented to report position of
>>> the fault.
>>>
>>> Signed-off-by: Marek Vasut<marex@denx.de>
>>> Cc: Wolfgang Denk<wd@denx.de>
>>> ---
>>>
>>>   include/common.h |   18 ++++++++++++++++++
>>>   1 file changed, 18 insertions(+)
>>>
>>> diff --git a/include/common.h b/include/common.h
>>> index 322569e..17c64b0 100644
>>> --- a/include/common.h
>>> +++ b/include/common.h
>>> @@ -730,6 +730,24 @@ void	invalidate_dcache_range(unsigned long start,
>>> unsigned long stop);
>>>
>>>   void	invalidate_dcache_all(void);
>>>   void	invalidate_icache_all(void);
>>>
>>> +/* Test if address is cache-aligned. Returns 0 if it is, 1 otherwise. */
>>> +#define cacheline_aligned(addr)						
> \
>>> +	({								\
>>> +	int __ret;							\
>>> +	if (!dcache_status()) {						\
>>> +		__ret = 1;						\
>>> +	} else if ((addr)&  (ARCH_DMA_MINALIGN - 1)) {			\
>>> +		puts("Align load address to "				\
>>> +			__stringify(ARCH_DMA_MINALIGN)			\
>>> +			" bytes when using caches!\n");			\
>>> +		__ret = 0;						\
>>> +	} else {							\
>>> +		__ret = 1;						\
>>> +	}								\
>>> +	__ret;								\
>>> +	})
>>
>> What if it's a store rather than a load?
>
> Goot point #1
>
>> If this is only supposed to be
>> used for loads (because on a store you can flush the cache rather than
>> invalidate), it's not labelled that way, the changelog says "to/from
>> unaligned address", and the caller might be common code that doesn't
>> know which direction the transfer is in.
>
> And we're back at the flush/invalidate thing. This is what I'd really love to
> understand once and for all:
>
> 1) We must prevent user from loading to address 0x.......1 (unaligned), I'll get
> back to this later on.
>
> 2) If the user had address that starts at aligned location but ends at
> unaligned:
>
> Terminology:
> va -- unrelated variable
> [****] -- aligned range
>
> 2a) LOAD from external source to local address:
> The cache must be invalidated over that area. I see the following issue:
>
> [---------buffer---------][va]
> [********][********][********]
> So consider the scenario where
> i) va is written in u-boot
> ii) data arrive into the buffer via DMA at the same time
> iii) cache is invalidated over whole buffer
>
> This will cause variable "va" to be lost forever. The "va" variable might as
> well be user data, therefore in order to protect those, we should first flush
> the user-supplied area.

Flushing the last cache line may corrupt the data you just DMAed in.
The current implementation after a lot of discussions [1] with Albert
is this:

Invalidate only the aligned part of the buffer. So, in the above case
the first two [********] will get invalidated not the last one.
The last un-aligned cache line will result in a warning.

[1] http://thread.gmane.org/gmane.comp.boot-loaders.u-boot/105113
>
> 2b) STORE from local address to external source:
> Basically the inverted problem as in 2a), but if the driver is written properly,
> shouldn't be any issue.

No. Store doesn't have any problem because flush doesn't harm anything.

I have laid down the invalidate/flush requirements for DMA buffers in:
doc/README.arm-caches

best regards,
Aneesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [U-Boot] [PATCH 6/9] CACHE: nand read/write: Test if start address is aligned
  2012-06-25  0:17 ` [U-Boot] [PATCH 6/9] CACHE: nand " Marek Vasut
  2012-06-25 16:58   ` Scott Wood
@ 2012-07-07  3:05   ` Aneesh V
  1 sibling, 0 replies; 36+ messages in thread
From: Aneesh V @ 2012-07-07  3:05 UTC (permalink / raw)
  To: u-boot

On 06/24/2012 05:17 PM, Marek Vasut wrote:
> This prevents the scenario where data cache is on and the
> device uses DMA to deploy data. In that case, it might not
> be possible to flush/invalidate data to RAM properly. The
> other option is to use bounce buffer, but that involves a
> lot of copying and therefore degrades performance rapidly.
> Therefore disallow this possibility of unaligned load address
> altogether if data cache is on.
>
> Signed-off-by: Marek Vasut<marex@denx.de>
> Cc: Scott Wood<scottwood@freescale.com>
> ---
>   common/cmd_nand.c |    9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/common/cmd_nand.c b/common/cmd_nand.c
> index a91ccf4..122a91c 100644
> --- a/common/cmd_nand.c
> +++ b/common/cmd_nand.c
> @@ -609,6 +609,8 @@ int do_nand(cmd_tbl_t * cmdtp, int flag, int argc, char * const argv[])
>   			goto usage;
>
>   		addr = (ulong)simple_strtoul(argv[2], NULL, 16);
> +		if (!cacheline_aligned(addr))
> +			return 1;

You need to check the end address too. Also, I agree with Scott that
that this is an un-justifiable restriction on cache-coherent
architectures. IMO, such checks should be done in platform specific
code where the DMA is being attempted.

best regards,
Aneesh

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2012-07-07  3:05 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-06-25  0:17 [U-Boot] [PATCH 0/9] CACHE: Finishing touches Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 1/9] COMMON: Add __stringify() function Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 2/9] CACHE: Add cache_aligned() macro Marek Vasut
2012-06-25 21:12   ` Scott Wood
2012-06-25 23:30     ` Marek Vasut
2012-07-07  3:00       ` Aneesh V
2012-06-25  0:17 ` [U-Boot] [PATCH 3/9] CACHE: ext2load: Test if start address is aligned Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 4/9] CACHE: fatload: " Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 5/9] CACHE: mmc read/write: " Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 6/9] CACHE: nand " Marek Vasut
2012-06-25 16:58   ` Scott Wood
2012-06-25 18:43     ` Tom Rini
2012-06-25 20:08       ` Scott Wood
2012-06-25 20:48         ` Tom Rini
2012-06-25 21:17           ` Scott Wood
2012-06-25 21:22             ` Tom Rini
2012-06-25 23:42             ` Marek Vasut
2012-06-26  0:37               ` Scott Wood
2012-06-26  1:16                 ` Marek Vasut
2012-06-26 19:38                   ` Scott Wood
2012-06-25 23:38       ` Marek Vasut
2012-06-25 23:37     ` Marek Vasut
2012-06-25 23:57       ` Scott Wood
2012-06-26  1:33         ` Marek Vasut
2012-06-26 19:25           ` Scott Wood
2012-06-26 20:39             ` Marek Vasut
2012-07-07  3:05   ` Aneesh V
2012-06-25  0:17 ` [U-Boot] [PATCH 7/9] CACHE: net: " Marek Vasut
2012-06-25 18:05   ` Joe Hershberger
2012-06-25 23:16     ` Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 8/9] CACHE: net: asix: Fix asix driver to work with data cache on Marek Vasut
2012-06-25 18:07   ` Joe Hershberger
2012-06-25 23:16     ` Marek Vasut
2012-07-06 23:09     ` Marek Vasut
2012-07-06 23:16       ` Marek Vasut
2012-06-25  0:17 ` [U-Boot] [PATCH 9/9] M28EVK: Enable instruction and data cache Marek Vasut

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.