All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-03-07 22:46 ` Heiko Stuebner
  0 siblings, 0 replies; 50+ messages in thread
From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	samuel, cmuellner, philipp.tomsich, Heiko Stuebner

This series is based on the alternatives changes done in my svpbmt series
and thus also depends on Atish's isa-extension parsing series.

It implements using the cache-management instructions from the  Zicbom-
extension to handle cache flush, etc actions on platforms needing them.

SoCs using cpu cores from T-Head like the Allwinne D1 implement a
different set of cache instructions. But while they are different,
instructions they provide the same functionality, so a variant can
easly hook into the existing alternatives mechanism on those.


Heiko Stuebner (2):
  riscv: Implement Zicbom-based cache management operations
  riscv: implement cache-management errata for T-Head SoCs

 arch/riscv/Kconfig                   |  8 +++
 arch/riscv/Kconfig.erratas           | 10 ++++
 arch/riscv/errata/thead/errata.c     |  5 ++
 arch/riscv/include/asm/errata_list.h | 78 +++++++++++++++++++++++++++-
 arch/riscv/include/asm/hwcap.h       |  1 +
 arch/riscv/kernel/cpu.c              |  1 +
 arch/riscv/kernel/cpufeature.c       | 17 ++++++
 arch/riscv/mm/Makefile               |  1 +
 arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++
 9 files changed, 180 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/mm/dma-noncoherent.c

-- 
2.30.2


^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-03-07 22:46 ` Heiko Stuebner
  0 siblings, 0 replies; 50+ messages in thread
From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	samuel, cmuellner, philipp.tomsich, Heiko Stuebner

This series is based on the alternatives changes done in my svpbmt series
and thus also depends on Atish's isa-extension parsing series.

It implements using the cache-management instructions from the  Zicbom-
extension to handle cache flush, etc actions on platforms needing them.

SoCs using cpu cores from T-Head like the Allwinne D1 implement a
different set of cache instructions. But while they are different,
instructions they provide the same functionality, so a variant can
easly hook into the existing alternatives mechanism on those.


Heiko Stuebner (2):
  riscv: Implement Zicbom-based cache management operations
  riscv: implement cache-management errata for T-Head SoCs

 arch/riscv/Kconfig                   |  8 +++
 arch/riscv/Kconfig.erratas           | 10 ++++
 arch/riscv/errata/thead/errata.c     |  5 ++
 arch/riscv/include/asm/errata_list.h | 78 +++++++++++++++++++++++++++-
 arch/riscv/include/asm/hwcap.h       |  1 +
 arch/riscv/kernel/cpu.c              |  1 +
 arch/riscv/kernel/cpufeature.c       | 17 ++++++
 arch/riscv/mm/Makefile               |  1 +
 arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++
 9 files changed, 180 insertions(+), 2 deletions(-)
 create mode 100644 arch/riscv/mm/dma-noncoherent.c

-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
  2022-03-07 22:46 ` Heiko Stuebner
@ 2022-03-07 22:46   ` Heiko Stuebner
  -1 siblings, 0 replies; 50+ messages in thread
From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	samuel, cmuellner, philipp.tomsich, Heiko Stuebner,
	Christoph Hellwig, Atish Patra

The Zicbom ISA-extension was ratified in november 2021
and introduces instructions for dcache invalidate, clean
and flush operations.

Implement cache management operations based on them.

Of course not all cores will support this, so implement an
alternative-based mechanism that replaces empty instructions
with ones done around Zicbom instructions.

We're using prebuild instructions for the Zicbom instructions
for now, to not require a bleeding-edge compiler (gcc-12)
for these somewhat simple instructions.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig                   |  8 ++++
 arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
 arch/riscv/include/asm/hwcap.h       |  1 +
 arch/riscv/kernel/cpu.c              |  1 +
 arch/riscv/kernel/cpufeature.c       | 17 ++++++++
 arch/riscv/mm/Makefile               |  1 +
 arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
 7 files changed, 125 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/mm/dma-noncoherent.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 5adcbd9b5e88..d3a1cd41c203 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -208,6 +208,14 @@ config PGTABLE_LEVELS
 config LOCKDEP_SUPPORT
 	def_bool y
 
+config RISCV_DMA_NONCOHERENT
+	bool "Support non-coherent dma operation"
+	select ARCH_HAS_DMA_PREP_COHERENT
+	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+	select ARCH_HAS_SYNC_DMA_FOR_CPU
+	select ARCH_HAS_SETUP_DMA_OPS
+	select DMA_DIRECT_REMAP
+
 source "arch/riscv/Kconfig.socs"
 source "arch/riscv/Kconfig.erratas"
 
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 4fac46b82c16..7a2dd61af24d 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -20,7 +20,8 @@
 #endif
 
 #define	CPUFEATURE_SVPBMT 0
-#define	CPUFEATURE_NUMBER 1
+#define	CPUFEATURE_CMO 1
+#define	CPUFEATURE_NUMBER 2
 
 #ifdef __ASSEMBLY__
 
@@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(								\
 #define ALT_THEAD_PMA(_val)
 #endif
 
+/*
+ * cbo.clean rs1
+ * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *    0...01     rs1       010      00000  0001111
+ *
+ * cbo.flush rs1
+ * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *    0...10     rs1       010      00000  0001111
+ *
+ * cbo.inval rs1
+ * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *    0...00     rs1       010      00000  0001111
+ */
+#define CBO_INVAL_A0	".long 0x15200F"
+#define CBO_CLEAN_A0	".long 0x25200F"
+#define CBO_FLUSH_A0	".long 0x05200F"
+
+#define ALT_CMO_OP(_op, _start, _size)							\
+asm volatile(ALTERNATIVE(								\
+	"nop\n\t"									\
+	"nop\n\t"									\
+	"nop\n\t"									\
+	"nop\n\t"									\
+	"nop",										\
+	"mv a0, %1\n\t"									\
+	"j 2f\n\t"									\
+	"3:\n\t"									\
+	CBO_##_op##_A0 "\n\t"								\
+	"addi a0, a0, %0\n\t"								\
+	"2:\n\t"									\
+	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
+	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
+	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 656cd626eb1a..5943d5125a51 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
  */
 enum riscv_isa_ext_id {
 	RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
+	RISCV_ISA_EXT_ZICBOM,
 	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
 };
 
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index c582d557e555..dfcf592273a7 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
 
 static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
+	__RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),
 	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
 };
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 5c5e6e7488ce..0e997fa5524a 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
 				set_bit(*ext - 'a', this_isa);
 			} else {
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
+				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
 			}
 #undef SET_ISA_EXT_MAP
 		}
@@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
 	return ret;
 }
 
+static bool cpufeature_cmo_check_func(unsigned int stage)
+{
+	switch (stage) {
+	case RISCV_ALTERNATIVES_EARLY_BOOT:
+		return false;
+	default:
+		return riscv_isa_extension_available(NULL, ZICBOM);
+	}
+
+	return false;
+}
+
 static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
 	{
 		.name = "svpbmt",
 		.check_func = cpufeature_svpbmt_check_func
 	},
+	{
+		.name = "cmo",
+		.check_func = cpufeature_cmo_check_func
+	},
 };
 
 static u32 __init_or_module cpufeature_probe(unsigned int stage)
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index ac7a25298a04..d76aabf4b94d 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -30,3 +30,4 @@ endif
 endif
 
 obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
+obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
new file mode 100644
index 000000000000..2c124bcc1932
--- /dev/null
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * RISC-V specific functions to support DMA for non-coherent devices
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include <linux/dma-direct.h>
+#include <linux/dma-map-ops.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/libfdt.h>
+#include <linux/mm.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	case DMA_FROM_DEVICE:
+		ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	case DMA_BIDIRECTIONAL:
+		ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	default:
+		break;
+	}
+}
+
+void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		break;
+	case DMA_FROM_DEVICE:
+	case DMA_BIDIRECTIONAL:
+		ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	default:
+		break;
+	}
+}
+
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+	void *flush_addr = page_address(page);
+
+	memset(flush_addr, 0, size);
+	ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
+}
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+		const struct iommu_ops *iommu, bool coherent)
+{
+	/* If a specific device is dma-coherent, set it here */
+	dev->dma_coherent = coherent;
+}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
@ 2022-03-07 22:46   ` Heiko Stuebner
  0 siblings, 0 replies; 50+ messages in thread
From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	samuel, cmuellner, philipp.tomsich, Heiko Stuebner,
	Christoph Hellwig, Atish Patra

The Zicbom ISA-extension was ratified in november 2021
and introduces instructions for dcache invalidate, clean
and flush operations.

Implement cache management operations based on them.

Of course not all cores will support this, so implement an
alternative-based mechanism that replaces empty instructions
with ones done around Zicbom instructions.

We're using prebuild instructions for the Zicbom instructions
for now, to not require a bleeding-edge compiler (gcc-12)
for these somewhat simple instructions.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Guo Ren <guoren@kernel.org>
---
 arch/riscv/Kconfig                   |  8 ++++
 arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
 arch/riscv/include/asm/hwcap.h       |  1 +
 arch/riscv/kernel/cpu.c              |  1 +
 arch/riscv/kernel/cpufeature.c       | 17 ++++++++
 arch/riscv/mm/Makefile               |  1 +
 arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
 7 files changed, 125 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/mm/dma-noncoherent.c

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 5adcbd9b5e88..d3a1cd41c203 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -208,6 +208,14 @@ config PGTABLE_LEVELS
 config LOCKDEP_SUPPORT
 	def_bool y
 
+config RISCV_DMA_NONCOHERENT
+	bool "Support non-coherent dma operation"
+	select ARCH_HAS_DMA_PREP_COHERENT
+	select ARCH_HAS_SYNC_DMA_FOR_DEVICE
+	select ARCH_HAS_SYNC_DMA_FOR_CPU
+	select ARCH_HAS_SETUP_DMA_OPS
+	select DMA_DIRECT_REMAP
+
 source "arch/riscv/Kconfig.socs"
 source "arch/riscv/Kconfig.erratas"
 
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 4fac46b82c16..7a2dd61af24d 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -20,7 +20,8 @@
 #endif
 
 #define	CPUFEATURE_SVPBMT 0
-#define	CPUFEATURE_NUMBER 1
+#define	CPUFEATURE_CMO 1
+#define	CPUFEATURE_NUMBER 2
 
 #ifdef __ASSEMBLY__
 
@@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(								\
 #define ALT_THEAD_PMA(_val)
 #endif
 
+/*
+ * cbo.clean rs1
+ * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *    0...01     rs1       010      00000  0001111
+ *
+ * cbo.flush rs1
+ * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *    0...10     rs1       010      00000  0001111
+ *
+ * cbo.inval rs1
+ * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *    0...00     rs1       010      00000  0001111
+ */
+#define CBO_INVAL_A0	".long 0x15200F"
+#define CBO_CLEAN_A0	".long 0x25200F"
+#define CBO_FLUSH_A0	".long 0x05200F"
+
+#define ALT_CMO_OP(_op, _start, _size)							\
+asm volatile(ALTERNATIVE(								\
+	"nop\n\t"									\
+	"nop\n\t"									\
+	"nop\n\t"									\
+	"nop\n\t"									\
+	"nop",										\
+	"mv a0, %1\n\t"									\
+	"j 2f\n\t"									\
+	"3:\n\t"									\
+	CBO_##_op##_A0 "\n\t"								\
+	"addi a0, a0, %0\n\t"								\
+	"2:\n\t"									\
+	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
+	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
+	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
+
 #endif /* __ASSEMBLY__ */
 
 #endif
diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
index 656cd626eb1a..5943d5125a51 100644
--- a/arch/riscv/include/asm/hwcap.h
+++ b/arch/riscv/include/asm/hwcap.h
@@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
  */
 enum riscv_isa_ext_id {
 	RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
+	RISCV_ISA_EXT_ZICBOM,
 	RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
 };
 
diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
index c582d557e555..dfcf592273a7 100644
--- a/arch/riscv/kernel/cpu.c
+++ b/arch/riscv/kernel/cpu.c
@@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
 
 static struct riscv_isa_ext_data isa_ext_arr[] = {
 	__RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
+	__RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),
 	__RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
 };
 
diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
index 5c5e6e7488ce..0e997fa5524a 100644
--- a/arch/riscv/kernel/cpufeature.c
+++ b/arch/riscv/kernel/cpufeature.c
@@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
 				set_bit(*ext - 'a', this_isa);
 			} else {
 				SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
+				SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
 			}
 #undef SET_ISA_EXT_MAP
 		}
@@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
 	return ret;
 }
 
+static bool cpufeature_cmo_check_func(unsigned int stage)
+{
+	switch (stage) {
+	case RISCV_ALTERNATIVES_EARLY_BOOT:
+		return false;
+	default:
+		return riscv_isa_extension_available(NULL, ZICBOM);
+	}
+
+	return false;
+}
+
 static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
 	{
 		.name = "svpbmt",
 		.check_func = cpufeature_svpbmt_check_func
 	},
+	{
+		.name = "cmo",
+		.check_func = cpufeature_cmo_check_func
+	},
 };
 
 static u32 __init_or_module cpufeature_probe(unsigned int stage)
diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index ac7a25298a04..d76aabf4b94d 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -30,3 +30,4 @@ endif
 endif
 
 obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
+obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
new file mode 100644
index 000000000000..2c124bcc1932
--- /dev/null
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -0,0 +1,61 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * RISC-V specific functions to support DMA for non-coherent devices
+ *
+ * Copyright (c) 2021 Western Digital Corporation or its affiliates.
+ */
+
+#include <linux/dma-direct.h>
+#include <linux/dma-map-ops.h>
+#include <linux/init.h>
+#include <linux/io.h>
+#include <linux/libfdt.h>
+#include <linux/mm.h>
+#include <linux/of.h>
+#include <linux/of_fdt.h>
+
+void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	case DMA_FROM_DEVICE:
+		ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	case DMA_BIDIRECTIONAL:
+		ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	default:
+		break;
+	}
+}
+
+void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
+{
+	switch (dir) {
+	case DMA_TO_DEVICE:
+		break;
+	case DMA_FROM_DEVICE:
+	case DMA_BIDIRECTIONAL:
+		ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+		break;
+	default:
+		break;
+	}
+}
+
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+	void *flush_addr = page_address(page);
+
+	memset(flush_addr, 0, size);
+	ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
+}
+
+void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
+		const struct iommu_ops *iommu, bool coherent)
+{
+	/* If a specific device is dma-coherent, set it here */
+	dev->dma_coherent = coherent;
+}
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
  2022-03-07 22:46 ` Heiko Stuebner
@ 2022-03-07 22:46   ` Heiko Stuebner
  -1 siblings, 0 replies; 50+ messages in thread
From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	samuel, cmuellner, philipp.tomsich, Heiko Stuebner

The T-Head C906 and C910 implement a scheme for handling
cache operations different from the generic Zicbom extension.

Add an errata for it next to the generic dma coherency ops.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
 arch/riscv/Kconfig.erratas           | 10 +++++++
 arch/riscv/errata/thead/errata.c     |  5 ++++
 arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
index de4002baa1d0..89a6dcb8ac2a 100644
--- a/arch/riscv/Kconfig.erratas
+++ b/arch/riscv/Kconfig.erratas
@@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
 
 	  If you don't know what to do here, say "Y".
 
+config ERRATA_THEAD_CMO
+	bool "Apply T-Head cache management errata"
+	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
+	default y
+	help
+	  This will apply the cache management errata to handle the
+	  non-standard handling on non-coherent operations on T-Head SoCs.
+
+	  If you don't know what to do here, say "Y".
+
 endmenu
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index fd8e0538a3f0..11c26c37425f 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
 		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
 		.check_func = errata_mt_check_func
 	},
+	{
+		.name = "cache-management",
+		.stage = RISCV_ALTERNATIVES_BOOT,
+		.check_func = errata_mt_check_func
+	},
 };
 
 static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 7a2dd61af24d..f7c6805daeab 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -16,7 +16,8 @@
 
 #ifdef CONFIG_ERRATA_THEAD
 #define	ERRATA_THEAD_PBMT 0
-#define	ERRATA_THEAD_NUMBER 1
+#define	ERRATA_THEAD_CMO 1
+#define	ERRATA_THEAD_NUMBER 2
 #endif
 
 #define	CPUFEATURE_SVPBMT 0
@@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
 #define CBO_CLEAN_A0	".long 0x25200F"
 #define CBO_FLUSH_A0	".long 0x05200F"
 
+/*
+ * dcache.ipa rs1 (invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000001    01010      rs1       000      00000  0001011
+ * dache.iva rs1 (invalida, virtual address)
+ *   0000001    00110      rs1       000      00000  0001011
+ *
+ * dcache.cpa rs1 (clean, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000001    01001      rs1       000      00000  0001011
+ * dcache.cva rs1 (clean, virtual address)
+ *   0000001    00100      rs1       000      00000  0001011
+ *
+ * dcache.cipa rs1 (clean then invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000001    01011      rs1       000      00000  0001011
+ * dcache.civa rs1 (... virtual address)
+ *   0000001    00111      rs1       000      00000  0001011
+ *
+ * sync.s (make sure all cache operations finished)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000000    11001     00000      000      00000  0001011
+ */
+#define THEAD_INVAL_A0	".long 0x0265000b"
+#define THEAD_CLEAN_A0	".long 0x0245000b"
+#define THEAD_FLUSH_A0	".long 0x0275000b"
+#define THEAD_SYNC_S	".long 0x0190000b"
+
 #define ALT_CMO_OP(_op, _start, _size)							\
-asm volatile(ALTERNATIVE(								\
+asm volatile(ALTERNATIVE_2(								\
+	"nop\n\t"									\
 	"nop\n\t"									\
 	"nop\n\t"									\
 	"nop\n\t"									\
@@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
 	CBO_##_op##_A0 "\n\t"								\
 	"addi a0, a0, %0\n\t"								\
 	"2:\n\t"									\
-	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
+	"bltu a0, %2, 3b\n\t"								\
+	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
+	"mv a0, %1\n\t"									\
+	"j 2f\n\t"									\
+	"3:\n\t"									\
+	THEAD_##_op##_A0 "\n\t"								\
+	"addi a0, a0, %0\n\t"								\
+	"2:\n\t"									\
+	"bltu a0, %2, 3b\n\t"								\
+	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
 	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
 	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
@ 2022-03-07 22:46   ` Heiko Stuebner
  0 siblings, 0 replies; 50+ messages in thread
From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw)
  To: palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	samuel, cmuellner, philipp.tomsich, Heiko Stuebner

The T-Head C906 and C910 implement a scheme for handling
cache operations different from the generic Zicbom extension.

Add an errata for it next to the generic dma coherency ops.

Signed-off-by: Heiko Stuebner <heiko@sntech.de>
---
 arch/riscv/Kconfig.erratas           | 10 +++++++
 arch/riscv/errata/thead/errata.c     |  5 ++++
 arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
 3 files changed, 57 insertions(+), 3 deletions(-)

diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
index de4002baa1d0..89a6dcb8ac2a 100644
--- a/arch/riscv/Kconfig.erratas
+++ b/arch/riscv/Kconfig.erratas
@@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
 
 	  If you don't know what to do here, say "Y".
 
+config ERRATA_THEAD_CMO
+	bool "Apply T-Head cache management errata"
+	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
+	default y
+	help
+	  This will apply the cache management errata to handle the
+	  non-standard handling on non-coherent operations on T-Head SoCs.
+
+	  If you don't know what to do here, say "Y".
+
 endmenu
diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
index fd8e0538a3f0..11c26c37425f 100644
--- a/arch/riscv/errata/thead/errata.c
+++ b/arch/riscv/errata/thead/errata.c
@@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
 		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
 		.check_func = errata_mt_check_func
 	},
+	{
+		.name = "cache-management",
+		.stage = RISCV_ALTERNATIVES_BOOT,
+		.check_func = errata_mt_check_func
+	},
 };
 
 static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
index 7a2dd61af24d..f7c6805daeab 100644
--- a/arch/riscv/include/asm/errata_list.h
+++ b/arch/riscv/include/asm/errata_list.h
@@ -16,7 +16,8 @@
 
 #ifdef CONFIG_ERRATA_THEAD
 #define	ERRATA_THEAD_PBMT 0
-#define	ERRATA_THEAD_NUMBER 1
+#define	ERRATA_THEAD_CMO 1
+#define	ERRATA_THEAD_NUMBER 2
 #endif
 
 #define	CPUFEATURE_SVPBMT 0
@@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
 #define CBO_CLEAN_A0	".long 0x25200F"
 #define CBO_FLUSH_A0	".long 0x05200F"
 
+/*
+ * dcache.ipa rs1 (invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000001    01010      rs1       000      00000  0001011
+ * dache.iva rs1 (invalida, virtual address)
+ *   0000001    00110      rs1       000      00000  0001011
+ *
+ * dcache.cpa rs1 (clean, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000001    01001      rs1       000      00000  0001011
+ * dcache.cva rs1 (clean, virtual address)
+ *   0000001    00100      rs1       000      00000  0001011
+ *
+ * dcache.cipa rs1 (clean then invalidate, physical address)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000001    01011      rs1       000      00000  0001011
+ * dcache.civa rs1 (... virtual address)
+ *   0000001    00111      rs1       000      00000  0001011
+ *
+ * sync.s (make sure all cache operations finished)
+ * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
+ *   0000000    11001     00000      000      00000  0001011
+ */
+#define THEAD_INVAL_A0	".long 0x0265000b"
+#define THEAD_CLEAN_A0	".long 0x0245000b"
+#define THEAD_FLUSH_A0	".long 0x0275000b"
+#define THEAD_SYNC_S	".long 0x0190000b"
+
 #define ALT_CMO_OP(_op, _start, _size)							\
-asm volatile(ALTERNATIVE(								\
+asm volatile(ALTERNATIVE_2(								\
+	"nop\n\t"									\
 	"nop\n\t"									\
 	"nop\n\t"									\
 	"nop\n\t"									\
@@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
 	CBO_##_op##_A0 "\n\t"								\
 	"addi a0, a0, %0\n\t"								\
 	"2:\n\t"									\
-	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
+	"bltu a0, %2, 3b\n\t"								\
+	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
+	"mv a0, %1\n\t"									\
+	"j 2f\n\t"									\
+	"3:\n\t"									\
+	THEAD_##_op##_A0 "\n\t"								\
+	"addi a0, a0, %0\n\t"								\
+	"2:\n\t"									\
+	"bltu a0, %2, 3b\n\t"								\
+	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
 	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
 	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
 
-- 
2.30.2


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
  2022-03-07 22:46   ` Heiko Stuebner
@ 2022-03-25 16:20     ` Anup Patel
  -1 siblings, 0 replies; 50+ messages in thread
From: Anup Patel @ 2022-03-25 16:20 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv,
	linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra,
	Nick Kossifidis, Samuel Holland, Christoph Muellner,
	Philipp Tomsich, Christoph Hellwig, Atish Patra

On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote:
>
> The Zicbom ISA-extension was ratified in november 2021
> and introduces instructions for dcache invalidate, clean
> and flush operations.
>
> Implement cache management operations based on them.
>
> Of course not all cores will support this, so implement an
> alternative-based mechanism that replaces empty instructions
> with ones done around Zicbom instructions.
>
> We're using prebuild instructions for the Zicbom instructions
> for now, to not require a bleeding-edge compiler (gcc-12)
> for these somewhat simple instructions.
>
> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Atish Patra <atish.patra@wdc.com>
> Cc: Guo Ren <guoren@kernel.org>
> ---
>  arch/riscv/Kconfig                   |  8 ++++
>  arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
>  arch/riscv/include/asm/hwcap.h       |  1 +
>  arch/riscv/kernel/cpu.c              |  1 +
>  arch/riscv/kernel/cpufeature.c       | 17 ++++++++
>  arch/riscv/mm/Makefile               |  1 +
>  arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
>  7 files changed, 125 insertions(+), 1 deletion(-)
>  create mode 100644 arch/riscv/mm/dma-noncoherent.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 5adcbd9b5e88..d3a1cd41c203 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -208,6 +208,14 @@ config PGTABLE_LEVELS
>  config LOCKDEP_SUPPORT
>         def_bool y
>
> +config RISCV_DMA_NONCOHERENT
> +       bool "Support non-coherent dma operation"
> +       select ARCH_HAS_DMA_PREP_COHERENT
> +       select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> +       select ARCH_HAS_SYNC_DMA_FOR_CPU
> +       select ARCH_HAS_SETUP_DMA_OPS
> +       select DMA_DIRECT_REMAP
> +
>  source "arch/riscv/Kconfig.socs"
>  source "arch/riscv/Kconfig.erratas"
>
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4fac46b82c16..7a2dd61af24d 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -20,7 +20,8 @@
>  #endif
>
>  #define        CPUFEATURE_SVPBMT 0
> -#define        CPUFEATURE_NUMBER 1
> +#define        CPUFEATURE_CMO 1
> +#define        CPUFEATURE_NUMBER 2
>
>  #ifdef __ASSEMBLY__
>
> @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(                                                            \
>  #define ALT_THEAD_PMA(_val)
>  #endif
>
> +/*
> + * cbo.clean rs1
> + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *    0...01     rs1       010      00000  0001111
> + *
> + * cbo.flush rs1
> + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *    0...10     rs1       010      00000  0001111
> + *
> + * cbo.inval rs1
> + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *    0...00     rs1       010      00000  0001111
> + */
> +#define CBO_INVAL_A0   ".long 0x15200F"
> +#define CBO_CLEAN_A0   ".long 0x25200F"
> +#define CBO_FLUSH_A0   ".long 0x05200F"
> +
> +#define ALT_CMO_OP(_op, _start, _size)                                                 \
> +asm volatile(ALTERNATIVE(                                                              \
> +       "nop\n\t"                                                                       \
> +       "nop\n\t"                                                                       \
> +       "nop\n\t"                                                                       \
> +       "nop\n\t"                                                                       \
> +       "nop",                                                                          \
> +       "mv a0, %1\n\t"                                                                 \
> +       "j 2f\n\t"                                                                      \
> +       "3:\n\t"                                                                        \
> +       CBO_##_op##_A0 "\n\t"                                                           \
> +       "addi a0, a0, %0\n\t"                                                           \
> +       "2:\n\t"                                                                        \
> +       "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
> +       : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
> +           "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))

Why not use a global variable (e.g. riscv_cbom_block_size) representing
exact cbom block size instead of L1_CACHE_BYTES ?

The default value of riscv_cbom_block_size can be L1_CACHE_BYTES
which can be overridden at boot-time using optional "riscv,cbom-block-size"
DT property.

The rationale here is that if underlying RISC-V implementation has cbom
block size smaller than L1_CACHE_BYTES then it will result in incomplete
cbom range operation. The riscv_cbom_block_size global variable ensures
that the right block size is used at least for cbom operations.

Regards,
Anup

> +
>  #endif /* __ASSEMBLY__ */
>
>  #endif
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index 656cd626eb1a..5943d5125a51 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
>   */
>  enum riscv_isa_ext_id {
>         RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
> +       RISCV_ISA_EXT_ZICBOM,
>         RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
>  };
>
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index c582d557e555..dfcf592273a7 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
>
>  static struct riscv_isa_ext_data isa_ext_arr[] = {
>         __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
> +       __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),

Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will
stringify the first parameter.

>         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
>  };
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 5c5e6e7488ce..0e997fa5524a 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
>                                 set_bit(*ext - 'a', this_isa);
>                         } else {
>                                 SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> +                               SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
>                         }
>  #undef SET_ISA_EXT_MAP
>                 }
> @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
>         return ret;
>  }
>
> +static bool cpufeature_cmo_check_func(unsigned int stage)
> +{
> +       switch (stage) {
> +       case RISCV_ALTERNATIVES_EARLY_BOOT:
> +               return false;
> +       default:
> +               return riscv_isa_extension_available(NULL, ZICBOM);
> +       }
> +
> +       return false;
> +}
> +
>  static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
>         {
>                 .name = "svpbmt",
>                 .check_func = cpufeature_svpbmt_check_func
>         },
> +       {
> +               .name = "cmo",
> +               .check_func = cpufeature_cmo_check_func
> +       },
>  };
>
>  static u32 __init_or_module cpufeature_probe(unsigned int stage)
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index ac7a25298a04..d76aabf4b94d 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -30,3 +30,4 @@ endif
>  endif
>
>  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
> +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> new file mode 100644
> index 000000000000..2c124bcc1932
> --- /dev/null
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * RISC-V specific functions to support DMA for non-coherent devices
> + *
> + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> + */
> +
> +#include <linux/dma-direct.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/init.h>
> +#include <linux/io.h>
> +#include <linux/libfdt.h>
> +#include <linux/mm.h>
> +#include <linux/of.h>
> +#include <linux/of_fdt.h>
> +
> +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> +{
> +       switch (dir) {
> +       case DMA_TO_DEVICE:
> +               ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       case DMA_FROM_DEVICE:
> +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       case DMA_BIDIRECTIONAL:
> +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       default:
> +               break;
> +       }
> +}
> +
> +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> +{
> +       switch (dir) {
> +       case DMA_TO_DEVICE:
> +               break;
> +       case DMA_FROM_DEVICE:
> +       case DMA_BIDIRECTIONAL:
> +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       default:
> +               break;
> +       }
> +}
> +
> +void arch_dma_prep_coherent(struct page *page, size_t size)
> +{
> +       void *flush_addr = page_address(page);
> +
> +       memset(flush_addr, 0, size);
> +       ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
> +}
> +
> +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> +               const struct iommu_ops *iommu, bool coherent)
> +{
> +       /* If a specific device is dma-coherent, set it here */
> +       dev->dma_coherent = coherent;
> +}
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
@ 2022-03-25 16:20     ` Anup Patel
  0 siblings, 0 replies; 50+ messages in thread
From: Anup Patel @ 2022-03-25 16:20 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv,
	linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra,
	Nick Kossifidis, Samuel Holland, Christoph Muellner,
	Philipp Tomsich, Christoph Hellwig, Atish Patra

On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote:
>
> The Zicbom ISA-extension was ratified in november 2021
> and introduces instructions for dcache invalidate, clean
> and flush operations.
>
> Implement cache management operations based on them.
>
> Of course not all cores will support this, so implement an
> alternative-based mechanism that replaces empty instructions
> with ones done around Zicbom instructions.
>
> We're using prebuild instructions for the Zicbom instructions
> for now, to not require a bleeding-edge compiler (gcc-12)
> for these somewhat simple instructions.
>
> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Atish Patra <atish.patra@wdc.com>
> Cc: Guo Ren <guoren@kernel.org>
> ---
>  arch/riscv/Kconfig                   |  8 ++++
>  arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
>  arch/riscv/include/asm/hwcap.h       |  1 +
>  arch/riscv/kernel/cpu.c              |  1 +
>  arch/riscv/kernel/cpufeature.c       | 17 ++++++++
>  arch/riscv/mm/Makefile               |  1 +
>  arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
>  7 files changed, 125 insertions(+), 1 deletion(-)
>  create mode 100644 arch/riscv/mm/dma-noncoherent.c
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 5adcbd9b5e88..d3a1cd41c203 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -208,6 +208,14 @@ config PGTABLE_LEVELS
>  config LOCKDEP_SUPPORT
>         def_bool y
>
> +config RISCV_DMA_NONCOHERENT
> +       bool "Support non-coherent dma operation"
> +       select ARCH_HAS_DMA_PREP_COHERENT
> +       select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> +       select ARCH_HAS_SYNC_DMA_FOR_CPU
> +       select ARCH_HAS_SETUP_DMA_OPS
> +       select DMA_DIRECT_REMAP
> +
>  source "arch/riscv/Kconfig.socs"
>  source "arch/riscv/Kconfig.erratas"
>
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 4fac46b82c16..7a2dd61af24d 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -20,7 +20,8 @@
>  #endif
>
>  #define        CPUFEATURE_SVPBMT 0
> -#define        CPUFEATURE_NUMBER 1
> +#define        CPUFEATURE_CMO 1
> +#define        CPUFEATURE_NUMBER 2
>
>  #ifdef __ASSEMBLY__
>
> @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(                                                            \
>  #define ALT_THEAD_PMA(_val)
>  #endif
>
> +/*
> + * cbo.clean rs1
> + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *    0...01     rs1       010      00000  0001111
> + *
> + * cbo.flush rs1
> + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *    0...10     rs1       010      00000  0001111
> + *
> + * cbo.inval rs1
> + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *    0...00     rs1       010      00000  0001111
> + */
> +#define CBO_INVAL_A0   ".long 0x15200F"
> +#define CBO_CLEAN_A0   ".long 0x25200F"
> +#define CBO_FLUSH_A0   ".long 0x05200F"
> +
> +#define ALT_CMO_OP(_op, _start, _size)                                                 \
> +asm volatile(ALTERNATIVE(                                                              \
> +       "nop\n\t"                                                                       \
> +       "nop\n\t"                                                                       \
> +       "nop\n\t"                                                                       \
> +       "nop\n\t"                                                                       \
> +       "nop",                                                                          \
> +       "mv a0, %1\n\t"                                                                 \
> +       "j 2f\n\t"                                                                      \
> +       "3:\n\t"                                                                        \
> +       CBO_##_op##_A0 "\n\t"                                                           \
> +       "addi a0, a0, %0\n\t"                                                           \
> +       "2:\n\t"                                                                        \
> +       "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
> +       : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
> +           "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))

Why not use a global variable (e.g. riscv_cbom_block_size) representing
exact cbom block size instead of L1_CACHE_BYTES ?

The default value of riscv_cbom_block_size can be L1_CACHE_BYTES
which can be overridden at boot-time using optional "riscv,cbom-block-size"
DT property.

The rationale here is that if underlying RISC-V implementation has cbom
block size smaller than L1_CACHE_BYTES then it will result in incomplete
cbom range operation. The riscv_cbom_block_size global variable ensures
that the right block size is used at least for cbom operations.

Regards,
Anup

> +
>  #endif /* __ASSEMBLY__ */
>
>  #endif
> diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> index 656cd626eb1a..5943d5125a51 100644
> --- a/arch/riscv/include/asm/hwcap.h
> +++ b/arch/riscv/include/asm/hwcap.h
> @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
>   */
>  enum riscv_isa_ext_id {
>         RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
> +       RISCV_ISA_EXT_ZICBOM,
>         RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
>  };
>
> diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> index c582d557e555..dfcf592273a7 100644
> --- a/arch/riscv/kernel/cpu.c
> +++ b/arch/riscv/kernel/cpu.c
> @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
>
>  static struct riscv_isa_ext_data isa_ext_arr[] = {
>         __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
> +       __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),

Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will
stringify the first parameter.

>         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
>  };
>
> diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> index 5c5e6e7488ce..0e997fa5524a 100644
> --- a/arch/riscv/kernel/cpufeature.c
> +++ b/arch/riscv/kernel/cpufeature.c
> @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
>                                 set_bit(*ext - 'a', this_isa);
>                         } else {
>                                 SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> +                               SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
>                         }
>  #undef SET_ISA_EXT_MAP
>                 }
> @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
>         return ret;
>  }
>
> +static bool cpufeature_cmo_check_func(unsigned int stage)
> +{
> +       switch (stage) {
> +       case RISCV_ALTERNATIVES_EARLY_BOOT:
> +               return false;
> +       default:
> +               return riscv_isa_extension_available(NULL, ZICBOM);
> +       }
> +
> +       return false;
> +}
> +
>  static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
>         {
>                 .name = "svpbmt",
>                 .check_func = cpufeature_svpbmt_check_func
>         },
> +       {
> +               .name = "cmo",
> +               .check_func = cpufeature_cmo_check_func
> +       },
>  };
>
>  static u32 __init_or_module cpufeature_probe(unsigned int stage)
> diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> index ac7a25298a04..d76aabf4b94d 100644
> --- a/arch/riscv/mm/Makefile
> +++ b/arch/riscv/mm/Makefile
> @@ -30,3 +30,4 @@ endif
>  endif
>
>  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
> +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> new file mode 100644
> index 000000000000..2c124bcc1932
> --- /dev/null
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -0,0 +1,61 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * RISC-V specific functions to support DMA for non-coherent devices
> + *
> + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> + */
> +
> +#include <linux/dma-direct.h>
> +#include <linux/dma-map-ops.h>
> +#include <linux/init.h>
> +#include <linux/io.h>
> +#include <linux/libfdt.h>
> +#include <linux/mm.h>
> +#include <linux/of.h>
> +#include <linux/of_fdt.h>
> +
> +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> +{
> +       switch (dir) {
> +       case DMA_TO_DEVICE:
> +               ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       case DMA_FROM_DEVICE:
> +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       case DMA_BIDIRECTIONAL:
> +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       default:
> +               break;
> +       }
> +}
> +
> +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> +{
> +       switch (dir) {
> +       case DMA_TO_DEVICE:
> +               break;
> +       case DMA_FROM_DEVICE:
> +       case DMA_BIDIRECTIONAL:
> +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> +               break;
> +       default:
> +               break;
> +       }
> +}
> +
> +void arch_dma_prep_coherent(struct page *page, size_t size)
> +{
> +       void *flush_addr = page_address(page);
> +
> +       memset(flush_addr, 0, size);
> +       ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
> +}
> +
> +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> +               const struct iommu_ops *iommu, bool coherent)
> +{
> +       /* If a specific device is dma-coherent, set it here */
> +       dev->dma_coherent = coherent;
> +}
> --
> 2.30.2
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
  2022-03-25 16:20     ` Anup Patel
@ 2022-03-25 17:24       ` Philipp Tomsich
  -1 siblings, 0 replies; 50+ messages in thread
From: Philipp Tomsich @ 2022-03-25 17:24 UTC (permalink / raw)
  To: Anup Patel
  Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv,
	linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra,
	Nick Kossifidis, Samuel Holland, Christoph Muellner,
	Christoph Hellwig, Atish Patra

On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote:
>
> On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote:
> >
> > The Zicbom ISA-extension was ratified in november 2021
> > and introduces instructions for dcache invalidate, clean
> > and flush operations.
> >
> > Implement cache management operations based on them.
> >
> > Of course not all cores will support this, so implement an
> > alternative-based mechanism that replaces empty instructions
> > with ones done around Zicbom instructions.
> >
> > We're using prebuild instructions for the Zicbom instructions
> > for now, to not require a bleeding-edge compiler (gcc-12)
> > for these somewhat simple instructions.
> >
> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Atish Patra <atish.patra@wdc.com>
> > Cc: Guo Ren <guoren@kernel.org>
> > ---
> >  arch/riscv/Kconfig                   |  8 ++++
> >  arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
> >  arch/riscv/include/asm/hwcap.h       |  1 +
> >  arch/riscv/kernel/cpu.c              |  1 +
> >  arch/riscv/kernel/cpufeature.c       | 17 ++++++++
> >  arch/riscv/mm/Makefile               |  1 +
> >  arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
> >  7 files changed, 125 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/riscv/mm/dma-noncoherent.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 5adcbd9b5e88..d3a1cd41c203 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS
> >  config LOCKDEP_SUPPORT
> >         def_bool y
> >
> > +config RISCV_DMA_NONCOHERENT
> > +       bool "Support non-coherent dma operation"
> > +       select ARCH_HAS_DMA_PREP_COHERENT
> > +       select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> > +       select ARCH_HAS_SYNC_DMA_FOR_CPU
> > +       select ARCH_HAS_SETUP_DMA_OPS
> > +       select DMA_DIRECT_REMAP
> > +
> >  source "arch/riscv/Kconfig.socs"
> >  source "arch/riscv/Kconfig.erratas"
> >
> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > index 4fac46b82c16..7a2dd61af24d 100644
> > --- a/arch/riscv/include/asm/errata_list.h
> > +++ b/arch/riscv/include/asm/errata_list.h
> > @@ -20,7 +20,8 @@
> >  #endif
> >
> >  #define        CPUFEATURE_SVPBMT 0
> > -#define        CPUFEATURE_NUMBER 1
> > +#define        CPUFEATURE_CMO 1
> > +#define        CPUFEATURE_NUMBER 2
> >
> >  #ifdef __ASSEMBLY__
> >
> > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(                                                            \
> >  #define ALT_THEAD_PMA(_val)
> >  #endif
> >
> > +/*
> > + * cbo.clean rs1
> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *    0...01     rs1       010      00000  0001111
> > + *
> > + * cbo.flush rs1
> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *    0...10     rs1       010      00000  0001111
> > + *
> > + * cbo.inval rs1
> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *    0...00     rs1       010      00000  0001111
> > + */
> > +#define CBO_INVAL_A0   ".long 0x15200F"
> > +#define CBO_CLEAN_A0   ".long 0x25200F"
> > +#define CBO_FLUSH_A0   ".long 0x05200F"
> > +
> > +#define ALT_CMO_OP(_op, _start, _size)                                                 \
> > +asm volatile(ALTERNATIVE(                                                              \
> > +       "nop\n\t"                                                                       \
> > +       "nop\n\t"                                                                       \
> > +       "nop\n\t"                                                                       \
> > +       "nop\n\t"                                                                       \
> > +       "nop",                                                                          \
> > +       "mv a0, %1\n\t"                                                                 \
> > +       "j 2f\n\t"                                                                      \
> > +       "3:\n\t"                                                                        \
> > +       CBO_##_op##_A0 "\n\t"                                                           \
> > +       "addi a0, a0, %0\n\t"                                                           \
> > +       "2:\n\t"                                                                        \
> > +       "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
> > +       : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
> > +           "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>
> Why not use a global variable (e.g. riscv_cbom_block_size) representing
> exact cbom block size instead of L1_CACHE_BYTES ?

Didn't the discussions around platforms gravitate towards a fixed
cache-block operations size (note that this is orthogonal from the
cache-line size) a requirement for Linux-capable platforms?

Philipp.

> The default value of riscv_cbom_block_size can be L1_CACHE_BYTES
> which can be overridden at boot-time using optional "riscv,cbom-block-size"
> DT property.
>
> The rationale here is that if underlying RISC-V implementation has cbom
> block size smaller than L1_CACHE_BYTES then it will result in incomplete
> cbom range operation. The riscv_cbom_block_size global variable ensures
> that the right block size is used at least for cbom operations.
>
> Regards,
> Anup
>
> > +
> >  #endif /* __ASSEMBLY__ */
> >
> >  #endif
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index 656cd626eb1a..5943d5125a51 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
> >   */
> >  enum riscv_isa_ext_id {
> >         RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
> > +       RISCV_ISA_EXT_ZICBOM,
> >         RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
> >  };
> >
> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index c582d557e555..dfcf592273a7 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
> >
> >  static struct riscv_isa_ext_data isa_ext_arr[] = {
> >         __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
> > +       __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),
>
> Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will
> stringify the first parameter.
>
> >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> >  };
> >
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 5c5e6e7488ce..0e997fa5524a 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
> >                                 set_bit(*ext - 'a', this_isa);
> >                         } else {
> >                                 SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> > +                               SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> >                         }
> >  #undef SET_ISA_EXT_MAP
> >                 }
> > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
> >         return ret;
> >  }
> >
> > +static bool cpufeature_cmo_check_func(unsigned int stage)
> > +{
> > +       switch (stage) {
> > +       case RISCV_ALTERNATIVES_EARLY_BOOT:
> > +               return false;
> > +       default:
> > +               return riscv_isa_extension_available(NULL, ZICBOM);
> > +       }
> > +
> > +       return false;
> > +}
> > +
> >  static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
> >         {
> >                 .name = "svpbmt",
> >                 .check_func = cpufeature_svpbmt_check_func
> >         },
> > +       {
> > +               .name = "cmo",
> > +               .check_func = cpufeature_cmo_check_func
> > +       },
> >  };
> >
> >  static u32 __init_or_module cpufeature_probe(unsigned int stage)
> > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> > index ac7a25298a04..d76aabf4b94d 100644
> > --- a/arch/riscv/mm/Makefile
> > +++ b/arch/riscv/mm/Makefile
> > @@ -30,3 +30,4 @@ endif
> >  endif
> >
> >  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
> > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > new file mode 100644
> > index 000000000000..2c124bcc1932
> > --- /dev/null
> > +++ b/arch/riscv/mm/dma-noncoherent.c
> > @@ -0,0 +1,61 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * RISC-V specific functions to support DMA for non-coherent devices
> > + *
> > + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> > + */
> > +
> > +#include <linux/dma-direct.h>
> > +#include <linux/dma-map-ops.h>
> > +#include <linux/init.h>
> > +#include <linux/io.h>
> > +#include <linux/libfdt.h>
> > +#include <linux/mm.h>
> > +#include <linux/of.h>
> > +#include <linux/of_fdt.h>
> > +
> > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> > +{
> > +       switch (dir) {
> > +       case DMA_TO_DEVICE:
> > +               ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       case DMA_FROM_DEVICE:
> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       case DMA_BIDIRECTIONAL:
> > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       default:
> > +               break;
> > +       }
> > +}
> > +
> > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> > +{
> > +       switch (dir) {
> > +       case DMA_TO_DEVICE:
> > +               break;
> > +       case DMA_FROM_DEVICE:
> > +       case DMA_BIDIRECTIONAL:
> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       default:
> > +               break;
> > +       }
> > +}
> > +
> > +void arch_dma_prep_coherent(struct page *page, size_t size)
> > +{
> > +       void *flush_addr = page_address(page);
> > +
> > +       memset(flush_addr, 0, size);
> > +       ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
> > +}
> > +
> > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > +               const struct iommu_ops *iommu, bool coherent)
> > +{
> > +       /* If a specific device is dma-coherent, set it here */
> > +       dev->dma_coherent = coherent;
> > +}
> > --
> > 2.30.2
> >

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
@ 2022-03-25 17:24       ` Philipp Tomsich
  0 siblings, 0 replies; 50+ messages in thread
From: Philipp Tomsich @ 2022-03-25 17:24 UTC (permalink / raw)
  To: Anup Patel
  Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv,
	linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra,
	Nick Kossifidis, Samuel Holland, Christoph Muellner,
	Christoph Hellwig, Atish Patra

On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote:
>
> On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote:
> >
> > The Zicbom ISA-extension was ratified in november 2021
> > and introduces instructions for dcache invalidate, clean
> > and flush operations.
> >
> > Implement cache management operations based on them.
> >
> > Of course not all cores will support this, so implement an
> > alternative-based mechanism that replaces empty instructions
> > with ones done around Zicbom instructions.
> >
> > We're using prebuild instructions for the Zicbom instructions
> > for now, to not require a bleeding-edge compiler (gcc-12)
> > for these somewhat simple instructions.
> >
> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Atish Patra <atish.patra@wdc.com>
> > Cc: Guo Ren <guoren@kernel.org>
> > ---
> >  arch/riscv/Kconfig                   |  8 ++++
> >  arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
> >  arch/riscv/include/asm/hwcap.h       |  1 +
> >  arch/riscv/kernel/cpu.c              |  1 +
> >  arch/riscv/kernel/cpufeature.c       | 17 ++++++++
> >  arch/riscv/mm/Makefile               |  1 +
> >  arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
> >  7 files changed, 125 insertions(+), 1 deletion(-)
> >  create mode 100644 arch/riscv/mm/dma-noncoherent.c
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 5adcbd9b5e88..d3a1cd41c203 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS
> >  config LOCKDEP_SUPPORT
> >         def_bool y
> >
> > +config RISCV_DMA_NONCOHERENT
> > +       bool "Support non-coherent dma operation"
> > +       select ARCH_HAS_DMA_PREP_COHERENT
> > +       select ARCH_HAS_SYNC_DMA_FOR_DEVICE
> > +       select ARCH_HAS_SYNC_DMA_FOR_CPU
> > +       select ARCH_HAS_SETUP_DMA_OPS
> > +       select DMA_DIRECT_REMAP
> > +
> >  source "arch/riscv/Kconfig.socs"
> >  source "arch/riscv/Kconfig.erratas"
> >
> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > index 4fac46b82c16..7a2dd61af24d 100644
> > --- a/arch/riscv/include/asm/errata_list.h
> > +++ b/arch/riscv/include/asm/errata_list.h
> > @@ -20,7 +20,8 @@
> >  #endif
> >
> >  #define        CPUFEATURE_SVPBMT 0
> > -#define        CPUFEATURE_NUMBER 1
> > +#define        CPUFEATURE_CMO 1
> > +#define        CPUFEATURE_NUMBER 2
> >
> >  #ifdef __ASSEMBLY__
> >
> > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(                                                            \
> >  #define ALT_THEAD_PMA(_val)
> >  #endif
> >
> > +/*
> > + * cbo.clean rs1
> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *    0...01     rs1       010      00000  0001111
> > + *
> > + * cbo.flush rs1
> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *    0...10     rs1       010      00000  0001111
> > + *
> > + * cbo.inval rs1
> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *    0...00     rs1       010      00000  0001111
> > + */
> > +#define CBO_INVAL_A0   ".long 0x15200F"
> > +#define CBO_CLEAN_A0   ".long 0x25200F"
> > +#define CBO_FLUSH_A0   ".long 0x05200F"
> > +
> > +#define ALT_CMO_OP(_op, _start, _size)                                                 \
> > +asm volatile(ALTERNATIVE(                                                              \
> > +       "nop\n\t"                                                                       \
> > +       "nop\n\t"                                                                       \
> > +       "nop\n\t"                                                                       \
> > +       "nop\n\t"                                                                       \
> > +       "nop",                                                                          \
> > +       "mv a0, %1\n\t"                                                                 \
> > +       "j 2f\n\t"                                                                      \
> > +       "3:\n\t"                                                                        \
> > +       CBO_##_op##_A0 "\n\t"                                                           \
> > +       "addi a0, a0, %0\n\t"                                                           \
> > +       "2:\n\t"                                                                        \
> > +       "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
> > +       : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
> > +           "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>
> Why not use a global variable (e.g. riscv_cbom_block_size) representing
> exact cbom block size instead of L1_CACHE_BYTES ?

Didn't the discussions around platforms gravitate towards a fixed
cache-block operations size (note that this is orthogonal from the
cache-line size) a requirement for Linux-capable platforms?

Philipp.

> The default value of riscv_cbom_block_size can be L1_CACHE_BYTES
> which can be overridden at boot-time using optional "riscv,cbom-block-size"
> DT property.
>
> The rationale here is that if underlying RISC-V implementation has cbom
> block size smaller than L1_CACHE_BYTES then it will result in incomplete
> cbom range operation. The riscv_cbom_block_size global variable ensures
> that the right block size is used at least for cbom operations.
>
> Regards,
> Anup
>
> > +
> >  #endif /* __ASSEMBLY__ */
> >
> >  #endif
> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
> > index 656cd626eb1a..5943d5125a51 100644
> > --- a/arch/riscv/include/asm/hwcap.h
> > +++ b/arch/riscv/include/asm/hwcap.h
> > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
> >   */
> >  enum riscv_isa_ext_id {
> >         RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
> > +       RISCV_ISA_EXT_ZICBOM,
> >         RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
> >  };
> >
> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
> > index c582d557e555..dfcf592273a7 100644
> > --- a/arch/riscv/kernel/cpu.c
> > +++ b/arch/riscv/kernel/cpu.c
> > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
> >
> >  static struct riscv_isa_ext_data isa_ext_arr[] = {
> >         __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
> > +       __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),
>
> Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will
> stringify the first parameter.
>
> >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
> >  };
> >
> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
> > index 5c5e6e7488ce..0e997fa5524a 100644
> > --- a/arch/riscv/kernel/cpufeature.c
> > +++ b/arch/riscv/kernel/cpufeature.c
> > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
> >                                 set_bit(*ext - 'a', this_isa);
> >                         } else {
> >                                 SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
> > +                               SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
> >                         }
> >  #undef SET_ISA_EXT_MAP
> >                 }
> > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
> >         return ret;
> >  }
> >
> > +static bool cpufeature_cmo_check_func(unsigned int stage)
> > +{
> > +       switch (stage) {
> > +       case RISCV_ALTERNATIVES_EARLY_BOOT:
> > +               return false;
> > +       default:
> > +               return riscv_isa_extension_available(NULL, ZICBOM);
> > +       }
> > +
> > +       return false;
> > +}
> > +
> >  static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
> >         {
> >                 .name = "svpbmt",
> >                 .check_func = cpufeature_svpbmt_check_func
> >         },
> > +       {
> > +               .name = "cmo",
> > +               .check_func = cpufeature_cmo_check_func
> > +       },
> >  };
> >
> >  static u32 __init_or_module cpufeature_probe(unsigned int stage)
> > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
> > index ac7a25298a04..d76aabf4b94d 100644
> > --- a/arch/riscv/mm/Makefile
> > +++ b/arch/riscv/mm/Makefile
> > @@ -30,3 +30,4 @@ endif
> >  endif
> >
> >  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
> > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > new file mode 100644
> > index 000000000000..2c124bcc1932
> > --- /dev/null
> > +++ b/arch/riscv/mm/dma-noncoherent.c
> > @@ -0,0 +1,61 @@
> > +// SPDX-License-Identifier: GPL-2.0-only
> > +/*
> > + * RISC-V specific functions to support DMA for non-coherent devices
> > + *
> > + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
> > + */
> > +
> > +#include <linux/dma-direct.h>
> > +#include <linux/dma-map-ops.h>
> > +#include <linux/init.h>
> > +#include <linux/io.h>
> > +#include <linux/libfdt.h>
> > +#include <linux/mm.h>
> > +#include <linux/of.h>
> > +#include <linux/of_fdt.h>
> > +
> > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> > +{
> > +       switch (dir) {
> > +       case DMA_TO_DEVICE:
> > +               ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       case DMA_FROM_DEVICE:
> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       case DMA_BIDIRECTIONAL:
> > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       default:
> > +               break;
> > +       }
> > +}
> > +
> > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
> > +{
> > +       switch (dir) {
> > +       case DMA_TO_DEVICE:
> > +               break;
> > +       case DMA_FROM_DEVICE:
> > +       case DMA_BIDIRECTIONAL:
> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > +               break;
> > +       default:
> > +               break;
> > +       }
> > +}
> > +
> > +void arch_dma_prep_coherent(struct page *page, size_t size)
> > +{
> > +       void *flush_addr = page_address(page);
> > +
> > +       memset(flush_addr, 0, size);
> > +       ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
> > +}
> > +
> > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
> > +               const struct iommu_ops *iommu, bool coherent)
> > +{
> > +       /* If a specific device is dma-coherent, set it here */
> > +       dev->dma_coherent = coherent;
> > +}
> > --
> > 2.30.2
> >

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
       [not found]     ` <CAAeLtUAi+61Hk7oBW979QEKYaume3vqdt_KkS_mXpRAs+CzHnA@mail.gmail.com>
@ 2022-03-25 17:37         ` Anup Patel
  0 siblings, 0 replies; 50+ messages in thread
From: Anup Patel @ 2022-03-25 17:37 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv,
	linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra,
	Nick Kossifidis, Samuel Holland, Christoph Muellner,
	Christoph Hellwig, Atish Patra

On Fri, Mar 25, 2022 at 10:52 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> Anup,
>
> On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote:
>>
>> On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote:
>> >
>> > The Zicbom ISA-extension was ratified in november 2021
>> > and introduces instructions for dcache invalidate, clean
>> > and flush operations.
>> >
>> > Implement cache management operations based on them.
>> >
>> > Of course not all cores will support this, so implement an
>> > alternative-based mechanism that replaces empty instructions
>> > with ones done around Zicbom instructions.
>> >
>> > We're using prebuild instructions for the Zicbom instructions
>> > for now, to not require a bleeding-edge compiler (gcc-12)
>> > for these somewhat simple instructions.
>> >
>> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
>> > Cc: Christoph Hellwig <hch@lst.de>
>> > Cc: Atish Patra <atish.patra@wdc.com>
>> > Cc: Guo Ren <guoren@kernel.org>
>> > ---
>> >  arch/riscv/Kconfig                   |  8 ++++
>> >  arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
>> >  arch/riscv/include/asm/hwcap.h       |  1 +
>> >  arch/riscv/kernel/cpu.c              |  1 +
>> >  arch/riscv/kernel/cpufeature.c       | 17 ++++++++
>> >  arch/riscv/mm/Makefile               |  1 +
>> >  arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
>> >  7 files changed, 125 insertions(+), 1 deletion(-)
>> >  create mode 100644 arch/riscv/mm/dma-noncoherent.c
>> >
>> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> > index 5adcbd9b5e88..d3a1cd41c203 100644
>> > --- a/arch/riscv/Kconfig
>> > +++ b/arch/riscv/Kconfig
>> > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS
>> >  config LOCKDEP_SUPPORT
>> >         def_bool y
>> >
>> > +config RISCV_DMA_NONCOHERENT
>> > +       bool "Support non-coherent dma operation"
>> > +       select ARCH_HAS_DMA_PREP_COHERENT
>> > +       select ARCH_HAS_SYNC_DMA_FOR_DEVICE
>> > +       select ARCH_HAS_SYNC_DMA_FOR_CPU
>> > +       select ARCH_HAS_SETUP_DMA_OPS
>> > +       select DMA_DIRECT_REMAP
>> > +
>> >  source "arch/riscv/Kconfig.socs"
>> >  source "arch/riscv/Kconfig.erratas"
>> >
>> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
>> > index 4fac46b82c16..7a2dd61af24d 100644
>> > --- a/arch/riscv/include/asm/errata_list.h
>> > +++ b/arch/riscv/include/asm/errata_list.h
>> > @@ -20,7 +20,8 @@
>> >  #endif
>> >
>> >  #define        CPUFEATURE_SVPBMT 0
>> > -#define        CPUFEATURE_NUMBER 1
>> > +#define        CPUFEATURE_CMO 1
>> > +#define        CPUFEATURE_NUMBER 2
>> >
>> >  #ifdef __ASSEMBLY__
>> >
>> > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(                                                            \
>> >  #define ALT_THEAD_PMA(_val)
>> >  #endif
>> >
>> > +/*
>> > + * cbo.clean rs1
>> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *    0...01     rs1       010      00000  0001111
>> > + *
>> > + * cbo.flush rs1
>> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *    0...10     rs1       010      00000  0001111
>> > + *
>> > + * cbo.inval rs1
>> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *    0...00     rs1       010      00000  0001111
>> > + */
>> > +#define CBO_INVAL_A0   ".long 0x15200F"
>> > +#define CBO_CLEAN_A0   ".long 0x25200F"
>> > +#define CBO_FLUSH_A0   ".long 0x05200F"
>> > +
>> > +#define ALT_CMO_OP(_op, _start, _size)                                                 \
>> > +asm volatile(ALTERNATIVE(                                                              \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop",                                                                          \
>> > +       "mv a0, %1\n\t"                                                                 \
>> > +       "j 2f\n\t"                                                                      \
>> > +       "3:\n\t"                                                                        \
>> > +       CBO_##_op##_A0 "\n\t"                                                           \
>> > +       "addi a0, a0, %0\n\t"                                                           \
>> > +       "2:\n\t"                                                                        \
>> > +       "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
>> > +       : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
>> > +           "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>>
>> Why not use a global variable (e.g. riscv_cbom_block_size) representing
>> exact cbom block size instead of L1_CACHE_BYTES ?
>
>
> Didn't the discussions around platforms gravitate towards a fixed cache-block
> operations size (note that this is orthogonal from the cache-line size) a
> requirement for Linux-capable platforms?

I recall past platform discussions. Implementations compliant with platforms
spec (whenever that is ratified) will converge to a fixed cache-block side but
this does not mean Linux CMO support should not work for implementations
with a different cache-block size.

For e.g. the ARM64 cache operations (arch/arm64/mm/cache.S) use
determine cache line size using ctr_el0 MSR.

Regards,
Anup

>
> Philipp.
>
>> The default value of riscv_cbom_block_size can be L1_CACHE_BYTES
>> which can be overridden at boot-time using optional "riscv,cbom-block-size"
>> DT property.
>>
>> The rationale here is that if underlying RISC-V implementation has cbom
>> block size smaller than L1_CACHE_BYTES then it will result in incomplete
>> cbom range operation. The riscv_cbom_block_size global variable ensures
>> that the right block size is used at least for cbom operations.
>>
>> Regards,
>> Anup
>>
>> > +
>> >  #endif /* __ASSEMBLY__ */
>> >
>> >  #endif
>> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
>> > index 656cd626eb1a..5943d5125a51 100644
>> > --- a/arch/riscv/include/asm/hwcap.h
>> > +++ b/arch/riscv/include/asm/hwcap.h
>> > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
>> >   */
>> >  enum riscv_isa_ext_id {
>> >         RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
>> > +       RISCV_ISA_EXT_ZICBOM,
>> >         RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
>> >  };
>> >
>> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
>> > index c582d557e555..dfcf592273a7 100644
>> > --- a/arch/riscv/kernel/cpu.c
>> > +++ b/arch/riscv/kernel/cpu.c
>> > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
>> >
>> >  static struct riscv_isa_ext_data isa_ext_arr[] = {
>> >         __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
>> > +       __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),
>>
>> Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will
>> stringify the first parameter.
>>
>> >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
>> >  };
>> >
>> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> > index 5c5e6e7488ce..0e997fa5524a 100644
>> > --- a/arch/riscv/kernel/cpufeature.c
>> > +++ b/arch/riscv/kernel/cpufeature.c
>> > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
>> >                                 set_bit(*ext - 'a', this_isa);
>> >                         } else {
>> >                                 SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
>> > +                               SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
>> >                         }
>> >  #undef SET_ISA_EXT_MAP
>> >                 }
>> > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
>> >         return ret;
>> >  }
>> >
>> > +static bool cpufeature_cmo_check_func(unsigned int stage)
>> > +{
>> > +       switch (stage) {
>> > +       case RISCV_ALTERNATIVES_EARLY_BOOT:
>> > +               return false;
>> > +       default:
>> > +               return riscv_isa_extension_available(NULL, ZICBOM);
>> > +       }
>> > +
>> > +       return false;
>> > +}
>> > +
>> >  static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
>> >         {
>> >                 .name = "svpbmt",
>> >                 .check_func = cpufeature_svpbmt_check_func
>> >         },
>> > +       {
>> > +               .name = "cmo",
>> > +               .check_func = cpufeature_cmo_check_func
>> > +       },
>> >  };
>> >
>> >  static u32 __init_or_module cpufeature_probe(unsigned int stage)
>> > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
>> > index ac7a25298a04..d76aabf4b94d 100644
>> > --- a/arch/riscv/mm/Makefile
>> > +++ b/arch/riscv/mm/Makefile
>> > @@ -30,3 +30,4 @@ endif
>> >  endif
>> >
>> >  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
>> > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
>> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
>> > new file mode 100644
>> > index 000000000000..2c124bcc1932
>> > --- /dev/null
>> > +++ b/arch/riscv/mm/dma-noncoherent.c
>> > @@ -0,0 +1,61 @@
>> > +// SPDX-License-Identifier: GPL-2.0-only
>> > +/*
>> > + * RISC-V specific functions to support DMA for non-coherent devices
>> > + *
>> > + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
>> > + */
>> > +
>> > +#include <linux/dma-direct.h>
>> > +#include <linux/dma-map-ops.h>
>> > +#include <linux/init.h>
>> > +#include <linux/io.h>
>> > +#include <linux/libfdt.h>
>> > +#include <linux/mm.h>
>> > +#include <linux/of.h>
>> > +#include <linux/of_fdt.h>
>> > +
>> > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
>> > +{
>> > +       switch (dir) {
>> > +       case DMA_TO_DEVICE:
>> > +               ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       case DMA_FROM_DEVICE:
>> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       case DMA_BIDIRECTIONAL:
>> > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       default:
>> > +               break;
>> > +       }
>> > +}
>> > +
>> > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
>> > +{
>> > +       switch (dir) {
>> > +       case DMA_TO_DEVICE:
>> > +               break;
>> > +       case DMA_FROM_DEVICE:
>> > +       case DMA_BIDIRECTIONAL:
>> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       default:
>> > +               break;
>> > +       }
>> > +}
>> > +
>> > +void arch_dma_prep_coherent(struct page *page, size_t size)
>> > +{
>> > +       void *flush_addr = page_address(page);
>> > +
>> > +       memset(flush_addr, 0, size);
>> > +       ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
>> > +}
>> > +
>> > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>> > +               const struct iommu_ops *iommu, bool coherent)
>> > +{
>> > +       /* If a specific device is dma-coherent, set it here */
>> > +       dev->dma_coherent = coherent;
>> > +}
>> > --
>> > 2.30.2
>> >
>>
>>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
@ 2022-03-25 17:37         ` Anup Patel
  0 siblings, 0 replies; 50+ messages in thread
From: Anup Patel @ 2022-03-25 17:37 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv,
	linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra,
	Nick Kossifidis, Samuel Holland, Christoph Muellner,
	Christoph Hellwig, Atish Patra

On Fri, Mar 25, 2022 at 10:52 PM Philipp Tomsich
<philipp.tomsich@vrull.eu> wrote:
>
> Anup,
>
> On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote:
>>
>> On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote:
>> >
>> > The Zicbom ISA-extension was ratified in november 2021
>> > and introduces instructions for dcache invalidate, clean
>> > and flush operations.
>> >
>> > Implement cache management operations based on them.
>> >
>> > Of course not all cores will support this, so implement an
>> > alternative-based mechanism that replaces empty instructions
>> > with ones done around Zicbom instructions.
>> >
>> > We're using prebuild instructions for the Zicbom instructions
>> > for now, to not require a bleeding-edge compiler (gcc-12)
>> > for these somewhat simple instructions.
>> >
>> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
>> > Cc: Christoph Hellwig <hch@lst.de>
>> > Cc: Atish Patra <atish.patra@wdc.com>
>> > Cc: Guo Ren <guoren@kernel.org>
>> > ---
>> >  arch/riscv/Kconfig                   |  8 ++++
>> >  arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++-
>> >  arch/riscv/include/asm/hwcap.h       |  1 +
>> >  arch/riscv/kernel/cpu.c              |  1 +
>> >  arch/riscv/kernel/cpufeature.c       | 17 ++++++++
>> >  arch/riscv/mm/Makefile               |  1 +
>> >  arch/riscv/mm/dma-noncoherent.c      | 61 ++++++++++++++++++++++++++++
>> >  7 files changed, 125 insertions(+), 1 deletion(-)
>> >  create mode 100644 arch/riscv/mm/dma-noncoherent.c
>> >
>> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
>> > index 5adcbd9b5e88..d3a1cd41c203 100644
>> > --- a/arch/riscv/Kconfig
>> > +++ b/arch/riscv/Kconfig
>> > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS
>> >  config LOCKDEP_SUPPORT
>> >         def_bool y
>> >
>> > +config RISCV_DMA_NONCOHERENT
>> > +       bool "Support non-coherent dma operation"
>> > +       select ARCH_HAS_DMA_PREP_COHERENT
>> > +       select ARCH_HAS_SYNC_DMA_FOR_DEVICE
>> > +       select ARCH_HAS_SYNC_DMA_FOR_CPU
>> > +       select ARCH_HAS_SETUP_DMA_OPS
>> > +       select DMA_DIRECT_REMAP
>> > +
>> >  source "arch/riscv/Kconfig.socs"
>> >  source "arch/riscv/Kconfig.erratas"
>> >
>> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
>> > index 4fac46b82c16..7a2dd61af24d 100644
>> > --- a/arch/riscv/include/asm/errata_list.h
>> > +++ b/arch/riscv/include/asm/errata_list.h
>> > @@ -20,7 +20,8 @@
>> >  #endif
>> >
>> >  #define        CPUFEATURE_SVPBMT 0
>> > -#define        CPUFEATURE_NUMBER 1
>> > +#define        CPUFEATURE_CMO 1
>> > +#define        CPUFEATURE_NUMBER 2
>> >
>> >  #ifdef __ASSEMBLY__
>> >
>> > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE(                                                            \
>> >  #define ALT_THEAD_PMA(_val)
>> >  #endif
>> >
>> > +/*
>> > + * cbo.clean rs1
>> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *    0...01     rs1       010      00000  0001111
>> > + *
>> > + * cbo.flush rs1
>> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *    0...10     rs1       010      00000  0001111
>> > + *
>> > + * cbo.inval rs1
>> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *    0...00     rs1       010      00000  0001111
>> > + */
>> > +#define CBO_INVAL_A0   ".long 0x15200F"
>> > +#define CBO_CLEAN_A0   ".long 0x25200F"
>> > +#define CBO_FLUSH_A0   ".long 0x05200F"
>> > +
>> > +#define ALT_CMO_OP(_op, _start, _size)                                                 \
>> > +asm volatile(ALTERNATIVE(                                                              \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop\n\t"                                                                       \
>> > +       "nop",                                                                          \
>> > +       "mv a0, %1\n\t"                                                                 \
>> > +       "j 2f\n\t"                                                                      \
>> > +       "3:\n\t"                                                                        \
>> > +       CBO_##_op##_A0 "\n\t"                                                           \
>> > +       "addi a0, a0, %0\n\t"                                                           \
>> > +       "2:\n\t"                                                                        \
>> > +       "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
>> > +       : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
>> > +           "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>>
>> Why not use a global variable (e.g. riscv_cbom_block_size) representing
>> exact cbom block size instead of L1_CACHE_BYTES ?
>
>
> Didn't the discussions around platforms gravitate towards a fixed cache-block
> operations size (note that this is orthogonal from the cache-line size) a
> requirement for Linux-capable platforms?

I recall past platform discussions. Implementations compliant with platforms
spec (whenever that is ratified) will converge to a fixed cache-block side but
this does not mean Linux CMO support should not work for implementations
with a different cache-block size.

For e.g. the ARM64 cache operations (arch/arm64/mm/cache.S) use
determine cache line size using ctr_el0 MSR.

Regards,
Anup

>
> Philipp.
>
>> The default value of riscv_cbom_block_size can be L1_CACHE_BYTES
>> which can be overridden at boot-time using optional "riscv,cbom-block-size"
>> DT property.
>>
>> The rationale here is that if underlying RISC-V implementation has cbom
>> block size smaller than L1_CACHE_BYTES then it will result in incomplete
>> cbom range operation. The riscv_cbom_block_size global variable ensures
>> that the right block size is used at least for cbom operations.
>>
>> Regards,
>> Anup
>>
>> > +
>> >  #endif /* __ASSEMBLY__ */
>> >
>> >  #endif
>> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h
>> > index 656cd626eb1a..5943d5125a51 100644
>> > --- a/arch/riscv/include/asm/hwcap.h
>> > +++ b/arch/riscv/include/asm/hwcap.h
>> > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap;
>> >   */
>> >  enum riscv_isa_ext_id {
>> >         RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE,
>> > +       RISCV_ISA_EXT_ZICBOM,
>> >         RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX,
>> >  };
>> >
>> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c
>> > index c582d557e555..dfcf592273a7 100644
>> > --- a/arch/riscv/kernel/cpu.c
>> > +++ b/arch/riscv/kernel/cpu.c
>> > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node)
>> >
>> >  static struct riscv_isa_ext_data isa_ext_arr[] = {
>> >         __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT),
>> > +       __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM),
>>
>> Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will
>> stringify the first parameter.
>>
>> >         __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX),
>> >  };
>> >
>> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c
>> > index 5c5e6e7488ce..0e997fa5524a 100644
>> > --- a/arch/riscv/kernel/cpufeature.c
>> > +++ b/arch/riscv/kernel/cpufeature.c
>> > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void)
>> >                                 set_bit(*ext - 'a', this_isa);
>> >                         } else {
>> >                                 SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT);
>> > +                               SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM);
>> >                         }
>> >  #undef SET_ISA_EXT_MAP
>> >                 }
>> > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage)
>> >         return ret;
>> >  }
>> >
>> > +static bool cpufeature_cmo_check_func(unsigned int stage)
>> > +{
>> > +       switch (stage) {
>> > +       case RISCV_ALTERNATIVES_EARLY_BOOT:
>> > +               return false;
>> > +       default:
>> > +               return riscv_isa_extension_available(NULL, ZICBOM);
>> > +       }
>> > +
>> > +       return false;
>> > +}
>> > +
>> >  static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = {
>> >         {
>> >                 .name = "svpbmt",
>> >                 .check_func = cpufeature_svpbmt_check_func
>> >         },
>> > +       {
>> > +               .name = "cmo",
>> > +               .check_func = cpufeature_cmo_check_func
>> > +       },
>> >  };
>> >
>> >  static u32 __init_or_module cpufeature_probe(unsigned int stage)
>> > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
>> > index ac7a25298a04..d76aabf4b94d 100644
>> > --- a/arch/riscv/mm/Makefile
>> > +++ b/arch/riscv/mm/Makefile
>> > @@ -30,3 +30,4 @@ endif
>> >  endif
>> >
>> >  obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
>> > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o
>> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
>> > new file mode 100644
>> > index 000000000000..2c124bcc1932
>> > --- /dev/null
>> > +++ b/arch/riscv/mm/dma-noncoherent.c
>> > @@ -0,0 +1,61 @@
>> > +// SPDX-License-Identifier: GPL-2.0-only
>> > +/*
>> > + * RISC-V specific functions to support DMA for non-coherent devices
>> > + *
>> > + * Copyright (c) 2021 Western Digital Corporation or its affiliates.
>> > + */
>> > +
>> > +#include <linux/dma-direct.h>
>> > +#include <linux/dma-map-ops.h>
>> > +#include <linux/init.h>
>> > +#include <linux/io.h>
>> > +#include <linux/libfdt.h>
>> > +#include <linux/mm.h>
>> > +#include <linux/of.h>
>> > +#include <linux/of_fdt.h>
>> > +
>> > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
>> > +{
>> > +       switch (dir) {
>> > +       case DMA_TO_DEVICE:
>> > +               ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       case DMA_FROM_DEVICE:
>> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       case DMA_BIDIRECTIONAL:
>> > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       default:
>> > +               break;
>> > +       }
>> > +}
>> > +
>> > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir)
>> > +{
>> > +       switch (dir) {
>> > +       case DMA_TO_DEVICE:
>> > +               break;
>> > +       case DMA_FROM_DEVICE:
>> > +       case DMA_BIDIRECTIONAL:
>> > +               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
>> > +               break;
>> > +       default:
>> > +               break;
>> > +       }
>> > +}
>> > +
>> > +void arch_dma_prep_coherent(struct page *page, size_t size)
>> > +{
>> > +       void *flush_addr = page_address(page);
>> > +
>> > +       memset(flush_addr, 0, size);
>> > +       ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size);
>> > +}
>> > +
>> > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size,
>> > +               const struct iommu_ops *iommu, bool coherent)
>> > +{
>> > +       /* If a specific device is dma-coherent, set it here */
>> > +       dev->dma_coherent = coherent;
>> > +}
>> > --
>> > 2.30.2
>> >
>>
>>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
  2022-03-07 22:46   ` Heiko Stuebner
@ 2022-03-31  2:30     ` Palmer Dabbelt
  -1 siblings, 0 replies; 50+ messages in thread
From: Palmer Dabbelt @ 2022-03-31  2:30 UTC (permalink / raw)
  To: heiko
  Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp,
	anup, mick, samuel, cmuellner, philipp.tomsich, heiko

On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
> The T-Head C906 and C910 implement a scheme for handling
> cache operations different from the generic Zicbom extension.
>
> Add an errata for it next to the generic dma coherency ops.
>
> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> ---
>  arch/riscv/Kconfig.erratas           | 10 +++++++
>  arch/riscv/errata/thead/errata.c     |  5 ++++
>  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
>  3 files changed, 57 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> index de4002baa1d0..89a6dcb8ac2a 100644
> --- a/arch/riscv/Kconfig.erratas
> +++ b/arch/riscv/Kconfig.erratas
> @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
>
>  	  If you don't know what to do here, say "Y".
>
> +config ERRATA_THEAD_CMO
> +	bool "Apply T-Head cache management errata"
> +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> +	default y
> +	help
> +	  This will apply the cache management errata to handle the
> +	  non-standard handling on non-coherent operations on T-Head SoCs.
> +
> +	  If you don't know what to do here, say "Y".
> +
>  endmenu
> diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> index fd8e0538a3f0..11c26c37425f 100644
> --- a/arch/riscv/errata/thead/errata.c
> +++ b/arch/riscv/errata/thead/errata.c
> @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
>  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
>  		.check_func = errata_mt_check_func
>  	},
> +	{
> +		.name = "cache-management",
> +		.stage = RISCV_ALTERNATIVES_BOOT,
> +		.check_func = errata_mt_check_func
> +	},
>  };
>
>  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 7a2dd61af24d..f7c6805daeab 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -16,7 +16,8 @@
>
>  #ifdef CONFIG_ERRATA_THEAD
>  #define	ERRATA_THEAD_PBMT 0
> -#define	ERRATA_THEAD_NUMBER 1
> +#define	ERRATA_THEAD_CMO 1
> +#define	ERRATA_THEAD_NUMBER 2
>  #endif
>
>  #define	CPUFEATURE_SVPBMT 0
> @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
>  #define CBO_CLEAN_A0	".long 0x25200F"
>  #define CBO_FLUSH_A0	".long 0x05200F"
>
> +/*
> + * dcache.ipa rs1 (invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01010      rs1       000      00000  0001011
> + * dache.iva rs1 (invalida, virtual address)
> + *   0000001    00110      rs1       000      00000  0001011
> + *
> + * dcache.cpa rs1 (clean, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01001      rs1       000      00000  0001011
> + * dcache.cva rs1 (clean, virtual address)
> + *   0000001    00100      rs1       000      00000  0001011
> + *
> + * dcache.cipa rs1 (clean then invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01011      rs1       000      00000  0001011
> + * dcache.civa rs1 (... virtual address)
> + *   0000001    00111      rs1       000      00000  0001011
> + *
> + * sync.s (make sure all cache operations finished)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000000    11001     00000      000      00000  0001011
> + */
> +#define THEAD_INVAL_A0	".long 0x0265000b"
> +#define THEAD_CLEAN_A0	".long 0x0245000b"
> +#define THEAD_FLUSH_A0	".long 0x0275000b"
> +#define THEAD_SYNC_S	".long 0x0190000b"

IIRC this came up before, but these really need to get into the 
assembler as actual instructions.

> +
>  #define ALT_CMO_OP(_op, _start, _size)							\
> -asm volatile(ALTERNATIVE(								\
> +asm volatile(ALTERNATIVE_2(								\
> +	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
> @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
>  	CBO_##_op##_A0 "\n\t"								\
>  	"addi a0, a0, %0\n\t"								\
>  	"2:\n\t"									\
> -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
> +	"bltu a0, %2, 3b\n\t"								\
> +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
> +	"mv a0, %1\n\t"									\
> +	"j 2f\n\t"									\
> +	"3:\n\t"									\
> +	THEAD_##_op##_A0 "\n\t"								\
> +	"addi a0, a0, %0\n\t"								\
> +	"2:\n\t"									\
> +	"bltu a0, %2, 3b\n\t"								\
> +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
>  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
>  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
@ 2022-03-31  2:30     ` Palmer Dabbelt
  0 siblings, 0 replies; 50+ messages in thread
From: Palmer Dabbelt @ 2022-03-31  2:30 UTC (permalink / raw)
  To: heiko
  Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp,
	anup, mick, samuel, cmuellner, philipp.tomsich, heiko

On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
> The T-Head C906 and C910 implement a scheme for handling
> cache operations different from the generic Zicbom extension.
>
> Add an errata for it next to the generic dma coherency ops.
>
> Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> ---
>  arch/riscv/Kconfig.erratas           | 10 +++++++
>  arch/riscv/errata/thead/errata.c     |  5 ++++
>  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
>  3 files changed, 57 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> index de4002baa1d0..89a6dcb8ac2a 100644
> --- a/arch/riscv/Kconfig.erratas
> +++ b/arch/riscv/Kconfig.erratas
> @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
>
>  	  If you don't know what to do here, say "Y".
>
> +config ERRATA_THEAD_CMO
> +	bool "Apply T-Head cache management errata"
> +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> +	default y
> +	help
> +	  This will apply the cache management errata to handle the
> +	  non-standard handling on non-coherent operations on T-Head SoCs.
> +
> +	  If you don't know what to do here, say "Y".
> +
>  endmenu
> diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> index fd8e0538a3f0..11c26c37425f 100644
> --- a/arch/riscv/errata/thead/errata.c
> +++ b/arch/riscv/errata/thead/errata.c
> @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
>  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
>  		.check_func = errata_mt_check_func
>  	},
> +	{
> +		.name = "cache-management",
> +		.stage = RISCV_ALTERNATIVES_BOOT,
> +		.check_func = errata_mt_check_func
> +	},
>  };
>
>  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 7a2dd61af24d..f7c6805daeab 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -16,7 +16,8 @@
>
>  #ifdef CONFIG_ERRATA_THEAD
>  #define	ERRATA_THEAD_PBMT 0
> -#define	ERRATA_THEAD_NUMBER 1
> +#define	ERRATA_THEAD_CMO 1
> +#define	ERRATA_THEAD_NUMBER 2
>  #endif
>
>  #define	CPUFEATURE_SVPBMT 0
> @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
>  #define CBO_CLEAN_A0	".long 0x25200F"
>  #define CBO_FLUSH_A0	".long 0x05200F"
>
> +/*
> + * dcache.ipa rs1 (invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01010      rs1       000      00000  0001011
> + * dache.iva rs1 (invalida, virtual address)
> + *   0000001    00110      rs1       000      00000  0001011
> + *
> + * dcache.cpa rs1 (clean, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01001      rs1       000      00000  0001011
> + * dcache.cva rs1 (clean, virtual address)
> + *   0000001    00100      rs1       000      00000  0001011
> + *
> + * dcache.cipa rs1 (clean then invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01011      rs1       000      00000  0001011
> + * dcache.civa rs1 (... virtual address)
> + *   0000001    00111      rs1       000      00000  0001011
> + *
> + * sync.s (make sure all cache operations finished)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000000    11001     00000      000      00000  0001011
> + */
> +#define THEAD_INVAL_A0	".long 0x0265000b"
> +#define THEAD_CLEAN_A0	".long 0x0245000b"
> +#define THEAD_FLUSH_A0	".long 0x0275000b"
> +#define THEAD_SYNC_S	".long 0x0190000b"

IIRC this came up before, but these really need to get into the 
assembler as actual instructions.

> +
>  #define ALT_CMO_OP(_op, _start, _size)							\
> -asm volatile(ALTERNATIVE(								\
> +asm volatile(ALTERNATIVE_2(								\
> +	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
> @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
>  	CBO_##_op##_A0 "\n\t"								\
>  	"addi a0, a0, %0\n\t"								\
>  	"2:\n\t"									\
> -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
> +	"bltu a0, %2, 3b\n\t"								\
> +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
> +	"mv a0, %1\n\t"									\
> +	"j 2f\n\t"									\
> +	"3:\n\t"									\
> +	THEAD_##_op##_A0 "\n\t"								\
> +	"addi a0, a0, %0\n\t"								\
> +	"2:\n\t"									\
> +	"bltu a0, %2, 3b\n\t"								\
> +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
>  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
>  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
  2022-03-31  2:30     ` Palmer Dabbelt
@ 2022-03-31  8:22       ` Heiko Stübner
  -1 siblings, 0 replies; 50+ messages in thread
From: Heiko Stübner @ 2022-03-31  8:22 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp,
	anup, mick, samuel, cmuellner, philipp.tomsich

Hi Palmer,

Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt:
> On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
> > The T-Head C906 and C910 implement a scheme for handling
> > cache operations different from the generic Zicbom extension.
> >
> > Add an errata for it next to the generic dma coherency ops.
> >
> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> > ---
> >  arch/riscv/Kconfig.erratas           | 10 +++++++
> >  arch/riscv/errata/thead/errata.c     |  5 ++++
> >  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
> >  3 files changed, 57 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> > index de4002baa1d0..89a6dcb8ac2a 100644
> > --- a/arch/riscv/Kconfig.erratas
> > +++ b/arch/riscv/Kconfig.erratas
> > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
> >
> >  	  If you don't know what to do here, say "Y".
> >
> > +config ERRATA_THEAD_CMO
> > +	bool "Apply T-Head cache management errata"
> > +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> > +	default y
> > +	help
> > +	  This will apply the cache management errata to handle the
> > +	  non-standard handling on non-coherent operations on T-Head SoCs.
> > +
> > +	  If you don't know what to do here, say "Y".
> > +
> >  endmenu
> > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> > index fd8e0538a3f0..11c26c37425f 100644
> > --- a/arch/riscv/errata/thead/errata.c
> > +++ b/arch/riscv/errata/thead/errata.c
> > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
> >  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
> >  		.check_func = errata_mt_check_func
> >  	},
> > +	{
> > +		.name = "cache-management",
> > +		.stage = RISCV_ALTERNATIVES_BOOT,
> > +		.check_func = errata_mt_check_func
> > +	},
> >  };
> >
> >  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > index 7a2dd61af24d..f7c6805daeab 100644
> > --- a/arch/riscv/include/asm/errata_list.h
> > +++ b/arch/riscv/include/asm/errata_list.h
> > @@ -16,7 +16,8 @@
> >
> >  #ifdef CONFIG_ERRATA_THEAD
> >  #define	ERRATA_THEAD_PBMT 0
> > -#define	ERRATA_THEAD_NUMBER 1
> > +#define	ERRATA_THEAD_CMO 1
> > +#define	ERRATA_THEAD_NUMBER 2
> >  #endif
> >
> >  #define	CPUFEATURE_SVPBMT 0
> > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
> >  #define CBO_CLEAN_A0	".long 0x25200F"
> >  #define CBO_FLUSH_A0	".long 0x05200F"
> >
> > +/*
> > + * dcache.ipa rs1 (invalidate, physical address)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000001    01010      rs1       000      00000  0001011
> > + * dache.iva rs1 (invalida, virtual address)
> > + *   0000001    00110      rs1       000      00000  0001011
> > + *
> > + * dcache.cpa rs1 (clean, physical address)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000001    01001      rs1       000      00000  0001011
> > + * dcache.cva rs1 (clean, virtual address)
> > + *   0000001    00100      rs1       000      00000  0001011
> > + *
> > + * dcache.cipa rs1 (clean then invalidate, physical address)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000001    01011      rs1       000      00000  0001011
> > + * dcache.civa rs1 (... virtual address)
> > + *   0000001    00111      rs1       000      00000  0001011
> > + *
> > + * sync.s (make sure all cache operations finished)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000000    11001     00000      000      00000  0001011
> > + */
> > +#define THEAD_INVAL_A0	".long 0x0265000b"
> > +#define THEAD_CLEAN_A0	".long 0x0245000b"
> > +#define THEAD_FLUSH_A0	".long 0x0275000b"
> > +#define THEAD_SYNC_S	".long 0x0190000b"
> 
> IIRC this came up before, but these really need to get into the 
> assembler as actual instructions.

okay :-) .

But just for my understanding which of the two ways going forward:
- keep this in the waiting area _until_ a suitable binutils is released
- use the coded instructions now and convert later once binutils is released

The reason I ask is, that any chip with a t-head core like the Allwinner-D1
will need this for things like basic networking, so with the binutils
release schedule, I guess we'd be looking at autumn 2022 at the earliest.


Thanks
Heiko

> > +
> >  #define ALT_CMO_OP(_op, _start, _size)							\
> > -asm volatile(ALTERNATIVE(								\
> > +asm volatile(ALTERNATIVE_2(								\
> > +	"nop\n\t"									\
> >  	"nop\n\t"									\
> >  	"nop\n\t"									\
> >  	"nop\n\t"									\
> > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
> >  	CBO_##_op##_A0 "\n\t"								\
> >  	"addi a0, a0, %0\n\t"								\
> >  	"2:\n\t"									\
> > -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
> > +	"bltu a0, %2, 3b\n\t"								\
> > +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
> > +	"mv a0, %1\n\t"									\
> > +	"j 2f\n\t"									\
> > +	"3:\n\t"									\
> > +	THEAD_##_op##_A0 "\n\t"								\
> > +	"addi a0, a0, %0\n\t"								\
> > +	"2:\n\t"									\
> > +	"bltu a0, %2, 3b\n\t"								\
> > +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
> >  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
> >  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
> 





^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
@ 2022-03-31  8:22       ` Heiko Stübner
  0 siblings, 0 replies; 50+ messages in thread
From: Heiko Stübner @ 2022-03-31  8:22 UTC (permalink / raw)
  To: Palmer Dabbelt
  Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp,
	anup, mick, samuel, cmuellner, philipp.tomsich

Hi Palmer,

Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt:
> On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
> > The T-Head C906 and C910 implement a scheme for handling
> > cache operations different from the generic Zicbom extension.
> >
> > Add an errata for it next to the generic dma coherency ops.
> >
> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> > ---
> >  arch/riscv/Kconfig.erratas           | 10 +++++++
> >  arch/riscv/errata/thead/errata.c     |  5 ++++
> >  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
> >  3 files changed, 57 insertions(+), 3 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> > index de4002baa1d0..89a6dcb8ac2a 100644
> > --- a/arch/riscv/Kconfig.erratas
> > +++ b/arch/riscv/Kconfig.erratas
> > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
> >
> >  	  If you don't know what to do here, say "Y".
> >
> > +config ERRATA_THEAD_CMO
> > +	bool "Apply T-Head cache management errata"
> > +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> > +	default y
> > +	help
> > +	  This will apply the cache management errata to handle the
> > +	  non-standard handling on non-coherent operations on T-Head SoCs.
> > +
> > +	  If you don't know what to do here, say "Y".
> > +
> >  endmenu
> > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> > index fd8e0538a3f0..11c26c37425f 100644
> > --- a/arch/riscv/errata/thead/errata.c
> > +++ b/arch/riscv/errata/thead/errata.c
> > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
> >  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
> >  		.check_func = errata_mt_check_func
> >  	},
> > +	{
> > +		.name = "cache-management",
> > +		.stage = RISCV_ALTERNATIVES_BOOT,
> > +		.check_func = errata_mt_check_func
> > +	},
> >  };
> >
> >  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > index 7a2dd61af24d..f7c6805daeab 100644
> > --- a/arch/riscv/include/asm/errata_list.h
> > +++ b/arch/riscv/include/asm/errata_list.h
> > @@ -16,7 +16,8 @@
> >
> >  #ifdef CONFIG_ERRATA_THEAD
> >  #define	ERRATA_THEAD_PBMT 0
> > -#define	ERRATA_THEAD_NUMBER 1
> > +#define	ERRATA_THEAD_CMO 1
> > +#define	ERRATA_THEAD_NUMBER 2
> >  #endif
> >
> >  #define	CPUFEATURE_SVPBMT 0
> > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
> >  #define CBO_CLEAN_A0	".long 0x25200F"
> >  #define CBO_FLUSH_A0	".long 0x05200F"
> >
> > +/*
> > + * dcache.ipa rs1 (invalidate, physical address)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000001    01010      rs1       000      00000  0001011
> > + * dache.iva rs1 (invalida, virtual address)
> > + *   0000001    00110      rs1       000      00000  0001011
> > + *
> > + * dcache.cpa rs1 (clean, physical address)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000001    01001      rs1       000      00000  0001011
> > + * dcache.cva rs1 (clean, virtual address)
> > + *   0000001    00100      rs1       000      00000  0001011
> > + *
> > + * dcache.cipa rs1 (clean then invalidate, physical address)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000001    01011      rs1       000      00000  0001011
> > + * dcache.civa rs1 (... virtual address)
> > + *   0000001    00111      rs1       000      00000  0001011
> > + *
> > + * sync.s (make sure all cache operations finished)
> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > + *   0000000    11001     00000      000      00000  0001011
> > + */
> > +#define THEAD_INVAL_A0	".long 0x0265000b"
> > +#define THEAD_CLEAN_A0	".long 0x0245000b"
> > +#define THEAD_FLUSH_A0	".long 0x0275000b"
> > +#define THEAD_SYNC_S	".long 0x0190000b"
> 
> IIRC this came up before, but these really need to get into the 
> assembler as actual instructions.

okay :-) .

But just for my understanding which of the two ways going forward:
- keep this in the waiting area _until_ a suitable binutils is released
- use the coded instructions now and convert later once binutils is released

The reason I ask is, that any chip with a t-head core like the Allwinner-D1
will need this for things like basic networking, so with the binutils
release schedule, I guess we'd be looking at autumn 2022 at the earliest.


Thanks
Heiko

> > +
> >  #define ALT_CMO_OP(_op, _start, _size)							\
> > -asm volatile(ALTERNATIVE(								\
> > +asm volatile(ALTERNATIVE_2(								\
> > +	"nop\n\t"									\
> >  	"nop\n\t"									\
> >  	"nop\n\t"									\
> >  	"nop\n\t"									\
> > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
> >  	CBO_##_op##_A0 "\n\t"								\
> >  	"addi a0, a0, %0\n\t"								\
> >  	"2:\n\t"									\
> > -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
> > +	"bltu a0, %2, 3b\n\t"								\
> > +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
> > +	"mv a0, %1\n\t"									\
> > +	"j 2f\n\t"									\
> > +	"3:\n\t"									\
> > +	THEAD_##_op##_A0 "\n\t"								\
> > +	"addi a0, a0, %0\n\t"								\
> > +	"2:\n\t"									\
> > +	"bltu a0, %2, 3b\n\t"								\
> > +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
> >  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
> >  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
> 





_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
  2022-03-31  8:22       ` Heiko Stübner
@ 2022-03-31  8:29         ` Philipp Tomsich
  -1 siblings, 0 replies; 50+ messages in thread
From: Philipp Tomsich @ 2022-03-31  8:29 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel, wefu,
	guoren, atishp, anup, mick, samuel, cmuellner

Palmer,

Could you confirm that I correctly understood what you require: is it
that a patch is on the binutils list?

Philipp.


On Thu, 31 Mar 2022 at 10:22, Heiko Stübner <heiko@sntech.de> wrote:
>
> Hi Palmer,
>
> Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt:
> > On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
> > > The T-Head C906 and C910 implement a scheme for handling
> > > cache operations different from the generic Zicbom extension.
> > >
> > > Add an errata for it next to the generic dma coherency ops.
> > >
> > > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> > > ---
> > >  arch/riscv/Kconfig.erratas           | 10 +++++++
> > >  arch/riscv/errata/thead/errata.c     |  5 ++++
> > >  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
> > >  3 files changed, 57 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> > > index de4002baa1d0..89a6dcb8ac2a 100644
> > > --- a/arch/riscv/Kconfig.erratas
> > > +++ b/arch/riscv/Kconfig.erratas
> > > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
> > >
> > >       If you don't know what to do here, say "Y".
> > >
> > > +config ERRATA_THEAD_CMO
> > > +   bool "Apply T-Head cache management errata"
> > > +   depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> > > +   default y
> > > +   help
> > > +     This will apply the cache management errata to handle the
> > > +     non-standard handling on non-coherent operations on T-Head SoCs.
> > > +
> > > +     If you don't know what to do here, say "Y".
> > > +
> > >  endmenu
> > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> > > index fd8e0538a3f0..11c26c37425f 100644
> > > --- a/arch/riscv/errata/thead/errata.c
> > > +++ b/arch/riscv/errata/thead/errata.c
> > > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
> > >             .stage = RISCV_ALTERNATIVES_EARLY_BOOT,
> > >             .check_func = errata_mt_check_func
> > >     },
> > > +   {
> > > +           .name = "cache-management",
> > > +           .stage = RISCV_ALTERNATIVES_BOOT,
> > > +           .check_func = errata_mt_check_func
> > > +   },
> > >  };
> > >
> > >  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > > index 7a2dd61af24d..f7c6805daeab 100644
> > > --- a/arch/riscv/include/asm/errata_list.h
> > > +++ b/arch/riscv/include/asm/errata_list.h
> > > @@ -16,7 +16,8 @@
> > >
> > >  #ifdef CONFIG_ERRATA_THEAD
> > >  #define    ERRATA_THEAD_PBMT 0
> > > -#define    ERRATA_THEAD_NUMBER 1
> > > +#define    ERRATA_THEAD_CMO 1
> > > +#define    ERRATA_THEAD_NUMBER 2
> > >  #endif
> > >
> > >  #define    CPUFEATURE_SVPBMT 0
> > > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(                                                              \
> > >  #define CBO_CLEAN_A0       ".long 0x25200F"
> > >  #define CBO_FLUSH_A0       ".long 0x05200F"
> > >
> > > +/*
> > > + * dcache.ipa rs1 (invalidate, physical address)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000001    01010      rs1       000      00000  0001011
> > > + * dache.iva rs1 (invalida, virtual address)
> > > + *   0000001    00110      rs1       000      00000  0001011
> > > + *
> > > + * dcache.cpa rs1 (clean, physical address)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000001    01001      rs1       000      00000  0001011
> > > + * dcache.cva rs1 (clean, virtual address)
> > > + *   0000001    00100      rs1       000      00000  0001011
> > > + *
> > > + * dcache.cipa rs1 (clean then invalidate, physical address)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000001    01011      rs1       000      00000  0001011
> > > + * dcache.civa rs1 (... virtual address)
> > > + *   0000001    00111      rs1       000      00000  0001011
> > > + *
> > > + * sync.s (make sure all cache operations finished)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000000    11001     00000      000      00000  0001011
> > > + */
> > > +#define THEAD_INVAL_A0     ".long 0x0265000b"
> > > +#define THEAD_CLEAN_A0     ".long 0x0245000b"
> > > +#define THEAD_FLUSH_A0     ".long 0x0275000b"
> > > +#define THEAD_SYNC_S       ".long 0x0190000b"
> >
> > IIRC this came up before, but these really need to get into the
> > assembler as actual instructions.
>
> okay :-) .
>
> But just for my understanding which of the two ways going forward:
> - keep this in the waiting area _until_ a suitable binutils is released
> - use the coded instructions now and convert later once binutils is released
>
> The reason I ask is, that any chip with a t-head core like the Allwinner-D1
> will need this for things like basic networking, so with the binutils
> release schedule, I guess we'd be looking at autumn 2022 at the earliest.
>
>
> Thanks
> Heiko
>
> > > +
> > >  #define ALT_CMO_OP(_op, _start, _size)                                                     \
> > > -asm volatile(ALTERNATIVE(                                                          \
> > > +asm volatile(ALTERNATIVE_2(                                                                \
> > > +   "nop\n\t"                                                                       \
> > >     "nop\n\t"                                                                       \
> > >     "nop\n\t"                                                                       \
> > >     "nop\n\t"                                                                       \
> > > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(                                                              \
> > >     CBO_##_op##_A0 "\n\t"                                                           \
> > >     "addi a0, a0, %0\n\t"                                                           \
> > >     "2:\n\t"                                                                        \
> > > -   "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
> > > +   "bltu a0, %2, 3b\n\t"                                                           \
> > > +   "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,                         \
> > > +   "mv a0, %1\n\t"                                                                 \
> > > +   "j 2f\n\t"                                                                      \
> > > +   "3:\n\t"                                                                        \
> > > +   THEAD_##_op##_A0 "\n\t"                                                         \
> > > +   "addi a0, a0, %0\n\t"                                                           \
> > > +   "2:\n\t"                                                                        \
> > > +   "bltu a0, %2, 3b\n\t"                                                           \
> > > +   THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)       \
> > >     : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
> > >         "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
> >
>
>
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
@ 2022-03-31  8:29         ` Philipp Tomsich
  0 siblings, 0 replies; 50+ messages in thread
From: Philipp Tomsich @ 2022-03-31  8:29 UTC (permalink / raw)
  To: Heiko Stübner
  Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel, wefu,
	guoren, atishp, anup, mick, samuel, cmuellner

Palmer,

Could you confirm that I correctly understood what you require: is it
that a patch is on the binutils list?

Philipp.


On Thu, 31 Mar 2022 at 10:22, Heiko Stübner <heiko@sntech.de> wrote:
>
> Hi Palmer,
>
> Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt:
> > On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
> > > The T-Head C906 and C910 implement a scheme for handling
> > > cache operations different from the generic Zicbom extension.
> > >
> > > Add an errata for it next to the generic dma coherency ops.
> > >
> > > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
> > > ---
> > >  arch/riscv/Kconfig.erratas           | 10 +++++++
> > >  arch/riscv/errata/thead/errata.c     |  5 ++++
> > >  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
> > >  3 files changed, 57 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> > > index de4002baa1d0..89a6dcb8ac2a 100644
> > > --- a/arch/riscv/Kconfig.erratas
> > > +++ b/arch/riscv/Kconfig.erratas
> > > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
> > >
> > >       If you don't know what to do here, say "Y".
> > >
> > > +config ERRATA_THEAD_CMO
> > > +   bool "Apply T-Head cache management errata"
> > > +   depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> > > +   default y
> > > +   help
> > > +     This will apply the cache management errata to handle the
> > > +     non-standard handling on non-coherent operations on T-Head SoCs.
> > > +
> > > +     If you don't know what to do here, say "Y".
> > > +
> > >  endmenu
> > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> > > index fd8e0538a3f0..11c26c37425f 100644
> > > --- a/arch/riscv/errata/thead/errata.c
> > > +++ b/arch/riscv/errata/thead/errata.c
> > > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
> > >             .stage = RISCV_ALTERNATIVES_EARLY_BOOT,
> > >             .check_func = errata_mt_check_func
> > >     },
> > > +   {
> > > +           .name = "cache-management",
> > > +           .stage = RISCV_ALTERNATIVES_BOOT,
> > > +           .check_func = errata_mt_check_func
> > > +   },
> > >  };
> > >
> > >  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> > > index 7a2dd61af24d..f7c6805daeab 100644
> > > --- a/arch/riscv/include/asm/errata_list.h
> > > +++ b/arch/riscv/include/asm/errata_list.h
> > > @@ -16,7 +16,8 @@
> > >
> > >  #ifdef CONFIG_ERRATA_THEAD
> > >  #define    ERRATA_THEAD_PBMT 0
> > > -#define    ERRATA_THEAD_NUMBER 1
> > > +#define    ERRATA_THEAD_CMO 1
> > > +#define    ERRATA_THEAD_NUMBER 2
> > >  #endif
> > >
> > >  #define    CPUFEATURE_SVPBMT 0
> > > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(                                                              \
> > >  #define CBO_CLEAN_A0       ".long 0x25200F"
> > >  #define CBO_FLUSH_A0       ".long 0x05200F"
> > >
> > > +/*
> > > + * dcache.ipa rs1 (invalidate, physical address)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000001    01010      rs1       000      00000  0001011
> > > + * dache.iva rs1 (invalida, virtual address)
> > > + *   0000001    00110      rs1       000      00000  0001011
> > > + *
> > > + * dcache.cpa rs1 (clean, physical address)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000001    01001      rs1       000      00000  0001011
> > > + * dcache.cva rs1 (clean, virtual address)
> > > + *   0000001    00100      rs1       000      00000  0001011
> > > + *
> > > + * dcache.cipa rs1 (clean then invalidate, physical address)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000001    01011      rs1       000      00000  0001011
> > > + * dcache.civa rs1 (... virtual address)
> > > + *   0000001    00111      rs1       000      00000  0001011
> > > + *
> > > + * sync.s (make sure all cache operations finished)
> > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> > > + *   0000000    11001     00000      000      00000  0001011
> > > + */
> > > +#define THEAD_INVAL_A0     ".long 0x0265000b"
> > > +#define THEAD_CLEAN_A0     ".long 0x0245000b"
> > > +#define THEAD_FLUSH_A0     ".long 0x0275000b"
> > > +#define THEAD_SYNC_S       ".long 0x0190000b"
> >
> > IIRC this came up before, but these really need to get into the
> > assembler as actual instructions.
>
> okay :-) .
>
> But just for my understanding which of the two ways going forward:
> - keep this in the waiting area _until_ a suitable binutils is released
> - use the coded instructions now and convert later once binutils is released
>
> The reason I ask is, that any chip with a t-head core like the Allwinner-D1
> will need this for things like basic networking, so with the binutils
> release schedule, I guess we'd be looking at autumn 2022 at the earliest.
>
>
> Thanks
> Heiko
>
> > > +
> > >  #define ALT_CMO_OP(_op, _start, _size)                                                     \
> > > -asm volatile(ALTERNATIVE(                                                          \
> > > +asm volatile(ALTERNATIVE_2(                                                                \
> > > +   "nop\n\t"                                                                       \
> > >     "nop\n\t"                                                                       \
> > >     "nop\n\t"                                                                       \
> > >     "nop\n\t"                                                                       \
> > > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(                                                              \
> > >     CBO_##_op##_A0 "\n\t"                                                           \
> > >     "addi a0, a0, %0\n\t"                                                           \
> > >     "2:\n\t"                                                                        \
> > > -   "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)         \
> > > +   "bltu a0, %2, 3b\n\t"                                                           \
> > > +   "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,                         \
> > > +   "mv a0, %1\n\t"                                                                 \
> > > +   "j 2f\n\t"                                                                      \
> > > +   "3:\n\t"                                                                        \
> > > +   THEAD_##_op##_A0 "\n\t"                                                         \
> > > +   "addi a0, a0, %0\n\t"                                                           \
> > > +   "2:\n\t"                                                                        \
> > > +   "bltu a0, %2, 3b\n\t"                                                           \
> > > +   THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)       \
> > >     : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),                 \
> > >         "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
> >
>
>
>
>

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
  2022-03-07 22:46   ` Heiko Stuebner
@ 2022-03-31 10:07     ` Christoph Hellwig
  -1 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2022-03-31 10:07 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren,
	atishp, anup, mick, samuel, cmuellner, philipp.tomsich,
	Christoph Hellwig, Atish Patra

I somehow only got patch 1 out of 2, which makes reviewing this very
hard.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations
@ 2022-03-31 10:07     ` Christoph Hellwig
  0 siblings, 0 replies; 50+ messages in thread
From: Christoph Hellwig @ 2022-03-31 10:07 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren,
	atishp, anup, mick, samuel, cmuellner, philipp.tomsich,
	Christoph Hellwig, Atish Patra

I somehow only got patch 1 out of 2, which makes reviewing this very
hard.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
  2022-03-07 22:46   ` Heiko Stuebner
@ 2022-04-01  1:05     ` Samuel Holland
  -1 siblings, 0 replies; 50+ messages in thread
From: Samuel Holland @ 2022-04-01  1:05 UTC (permalink / raw)
  To: Heiko Stuebner, palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	cmuellner, philipp.tomsich

On 3/7/22 4:46 PM, Heiko Stuebner wrote:
> The T-Head C906 and C910 implement a scheme for handling
> cache operations different from the generic Zicbom extension.
> 
> Add an errata for it next to the generic dma coherency ops.
> 
> Signed-off-by: Heiko Stuebner <heiko@sntech.de>

Tested-by: Samuel Holland <samuel@sholland.org>

With this option disabled, MMC and USB are broken on D1 boards:

[    3.021326] Waiting for root device /dev/mmcblk0p1...
[    3.219727] usb 4-1: new full-speed USB device number 2 using ohci-platform
[   18.703736] usb 4-1: device descriptor read/64, error -110

With the option enabled, MMC, USB, and Ethernet all work fine.

> ---
>  arch/riscv/Kconfig.erratas           | 10 +++++++
>  arch/riscv/errata/thead/errata.c     |  5 ++++
>  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
>  3 files changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> index de4002baa1d0..89a6dcb8ac2a 100644
> --- a/arch/riscv/Kconfig.erratas
> +++ b/arch/riscv/Kconfig.erratas
> @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
>  
>  	  If you don't know what to do here, say "Y".
>  
> +config ERRATA_THEAD_CMO
> +	bool "Apply T-Head cache management errata"
> +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> +	default y
> +	help
> +	  This will apply the cache management errata to handle the
> +	  non-standard handling on non-coherent operations on T-Head SoCs.
> +
> +	  If you don't know what to do here, say "Y".
> +
>  endmenu
> diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> index fd8e0538a3f0..11c26c37425f 100644
> --- a/arch/riscv/errata/thead/errata.c
> +++ b/arch/riscv/errata/thead/errata.c
> @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
>  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
>  		.check_func = errata_mt_check_func
>  	},
> +	{
> +		.name = "cache-management",
> +		.stage = RISCV_ALTERNATIVES_BOOT,
> +		.check_func = errata_mt_check_func
> +	},
>  };
>  
>  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 7a2dd61af24d..f7c6805daeab 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -16,7 +16,8 @@
>  
>  #ifdef CONFIG_ERRATA_THEAD
>  #define	ERRATA_THEAD_PBMT 0
> -#define	ERRATA_THEAD_NUMBER 1
> +#define	ERRATA_THEAD_CMO 1
> +#define	ERRATA_THEAD_NUMBER 2
>  #endif
>  
>  #define	CPUFEATURE_SVPBMT 0
> @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
>  #define CBO_CLEAN_A0	".long 0x25200F"
>  #define CBO_FLUSH_A0	".long 0x05200F"
>  
> +/*
> + * dcache.ipa rs1 (invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01010      rs1       000      00000  0001011
> + * dache.iva rs1 (invalida, virtual address)
> + *   0000001    00110      rs1       000      00000  0001011
> + *
> + * dcache.cpa rs1 (clean, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01001      rs1       000      00000  0001011
> + * dcache.cva rs1 (clean, virtual address)
> + *   0000001    00100      rs1       000      00000  0001011
> + *
> + * dcache.cipa rs1 (clean then invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01011      rs1       000      00000  0001011
> + * dcache.civa rs1 (... virtual address)
> + *   0000001    00111      rs1       000      00000  0001011
> + *
> + * sync.s (make sure all cache operations finished)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000000    11001     00000      000      00000  0001011
> + */
> +#define THEAD_INVAL_A0	".long 0x0265000b"
> +#define THEAD_CLEAN_A0	".long 0x0245000b"
> +#define THEAD_FLUSH_A0	".long 0x0275000b"
> +#define THEAD_SYNC_S	".long 0x0190000b"
> +
>  #define ALT_CMO_OP(_op, _start, _size)							\
> -asm volatile(ALTERNATIVE(								\
> +asm volatile(ALTERNATIVE_2(								\
> +	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
> @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
>  	CBO_##_op##_A0 "\n\t"								\
>  	"addi a0, a0, %0\n\t"								\
>  	"2:\n\t"									\
> -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
> +	"bltu a0, %2, 3b\n\t"								\
> +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
> +	"mv a0, %1\n\t"									\
> +	"j 2f\n\t"									\
> +	"3:\n\t"									\
> +	THEAD_##_op##_A0 "\n\t"								\
> +	"addi a0, a0, %0\n\t"								\
> +	"2:\n\t"									\
> +	"bltu a0, %2, 3b\n\t"								\
> +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
>  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
>  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>  
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
@ 2022-04-01  1:05     ` Samuel Holland
  0 siblings, 0 replies; 50+ messages in thread
From: Samuel Holland @ 2022-04-01  1:05 UTC (permalink / raw)
  To: Heiko Stuebner, palmer, paul.walmsley
  Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick,
	cmuellner, philipp.tomsich

On 3/7/22 4:46 PM, Heiko Stuebner wrote:
> The T-Head C906 and C910 implement a scheme for handling
> cache operations different from the generic Zicbom extension.
> 
> Add an errata for it next to the generic dma coherency ops.
> 
> Signed-off-by: Heiko Stuebner <heiko@sntech.de>

Tested-by: Samuel Holland <samuel@sholland.org>

With this option disabled, MMC and USB are broken on D1 boards:

[    3.021326] Waiting for root device /dev/mmcblk0p1...
[    3.219727] usb 4-1: new full-speed USB device number 2 using ohci-platform
[   18.703736] usb 4-1: device descriptor read/64, error -110

With the option enabled, MMC, USB, and Ethernet all work fine.

> ---
>  arch/riscv/Kconfig.erratas           | 10 +++++++
>  arch/riscv/errata/thead/errata.c     |  5 ++++
>  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
>  3 files changed, 57 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
> index de4002baa1d0..89a6dcb8ac2a 100644
> --- a/arch/riscv/Kconfig.erratas
> +++ b/arch/riscv/Kconfig.erratas
> @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
>  
>  	  If you don't know what to do here, say "Y".
>  
> +config ERRATA_THEAD_CMO
> +	bool "Apply T-Head cache management errata"
> +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
> +	default y
> +	help
> +	  This will apply the cache management errata to handle the
> +	  non-standard handling on non-coherent operations on T-Head SoCs.
> +
> +	  If you don't know what to do here, say "Y".
> +
>  endmenu
> diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
> index fd8e0538a3f0..11c26c37425f 100644
> --- a/arch/riscv/errata/thead/errata.c
> +++ b/arch/riscv/errata/thead/errata.c
> @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
>  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
>  		.check_func = errata_mt_check_func
>  	},
> +	{
> +		.name = "cache-management",
> +		.stage = RISCV_ALTERNATIVES_BOOT,
> +		.check_func = errata_mt_check_func
> +	},
>  };
>  
>  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
> diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
> index 7a2dd61af24d..f7c6805daeab 100644
> --- a/arch/riscv/include/asm/errata_list.h
> +++ b/arch/riscv/include/asm/errata_list.h
> @@ -16,7 +16,8 @@
>  
>  #ifdef CONFIG_ERRATA_THEAD
>  #define	ERRATA_THEAD_PBMT 0
> -#define	ERRATA_THEAD_NUMBER 1
> +#define	ERRATA_THEAD_CMO 1
> +#define	ERRATA_THEAD_NUMBER 2
>  #endif
>  
>  #define	CPUFEATURE_SVPBMT 0
> @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
>  #define CBO_CLEAN_A0	".long 0x25200F"
>  #define CBO_FLUSH_A0	".long 0x05200F"
>  
> +/*
> + * dcache.ipa rs1 (invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01010      rs1       000      00000  0001011
> + * dache.iva rs1 (invalida, virtual address)
> + *   0000001    00110      rs1       000      00000  0001011
> + *
> + * dcache.cpa rs1 (clean, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01001      rs1       000      00000  0001011
> + * dcache.cva rs1 (clean, virtual address)
> + *   0000001    00100      rs1       000      00000  0001011
> + *
> + * dcache.cipa rs1 (clean then invalidate, physical address)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000001    01011      rs1       000      00000  0001011
> + * dcache.civa rs1 (... virtual address)
> + *   0000001    00111      rs1       000      00000  0001011
> + *
> + * sync.s (make sure all cache operations finished)
> + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
> + *   0000000    11001     00000      000      00000  0001011
> + */
> +#define THEAD_INVAL_A0	".long 0x0265000b"
> +#define THEAD_CLEAN_A0	".long 0x0245000b"
> +#define THEAD_FLUSH_A0	".long 0x0275000b"
> +#define THEAD_SYNC_S	".long 0x0190000b"
> +
>  #define ALT_CMO_OP(_op, _start, _size)							\
> -asm volatile(ALTERNATIVE(								\
> +asm volatile(ALTERNATIVE_2(								\
> +	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
>  	"nop\n\t"									\
> @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
>  	CBO_##_op##_A0 "\n\t"								\
>  	"addi a0, a0, %0\n\t"								\
>  	"2:\n\t"									\
> -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
> +	"bltu a0, %2, 3b\n\t"								\
> +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
> +	"mv a0, %1\n\t"									\
> +	"j 2f\n\t"									\
> +	"3:\n\t"									\
> +	THEAD_##_op##_A0 "\n\t"								\
> +	"addi a0, a0, %0\n\t"								\
> +	"2:\n\t"									\
> +	"bltu a0, %2, 3b\n\t"								\
> +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
>  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
>  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>  
> 


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-03-07 22:46 ` Heiko Stuebner
@ 2022-04-15 11:26   ` Corentin Labbe
  -1 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-15 11:26 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren,
	atishp, anup, mick, samuel, cmuellner, philipp.tomsich

Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> This series is based on the alternatives changes done in my svpbmt series
> and thus also depends on Atish's isa-extension parsing series.
> 
> It implements using the cache-management instructions from the  Zicbom-
> extension to handle cache flush, etc actions on platforms needing them.
> 
> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> different set of cache instructions. But while they are different,
> instructions they provide the same functionality, so a variant can
> easly hook into the existing alternatives mechanism on those.
> 
> 

Hello

I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.

I am hitting a buffer corruption problem with DMA.
The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
In fact the buffer is not overran by device but by dma_map_single() operation.

The following small code show the problem:

dma_addr_t dma;
u8 *buf;
#define BSIZE 2048
#define DMASIZE 16

buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
for (i = 0; i < BSIZE; i++)
    buf[i] = 0xFE;
print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE);
print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);

Will lead to:
[    2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe

Even with no DMA action, the buffer is corrupted.

Regards

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-15 11:26   ` Corentin Labbe
  0 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-15 11:26 UTC (permalink / raw)
  To: Heiko Stuebner
  Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren,
	atishp, anup, mick, samuel, cmuellner, philipp.tomsich

Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> This series is based on the alternatives changes done in my svpbmt series
> and thus also depends on Atish's isa-extension parsing series.
> 
> It implements using the cache-management instructions from the  Zicbom-
> extension to handle cache flush, etc actions on platforms needing them.
> 
> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> different set of cache instructions. But while they are different,
> instructions they provide the same functionality, so a variant can
> easly hook into the existing alternatives mechanism on those.
> 
> 

Hello

I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.

I am hitting a buffer corruption problem with DMA.
The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
In fact the buffer is not overran by device but by dma_map_single() operation.

The following small code show the problem:

dma_addr_t dma;
u8 *buf;
#define BSIZE 2048
#define DMASIZE 16

buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
for (i = 0; i < BSIZE; i++)
    buf[i] = 0xFE;
print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE);
print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);

Will lead to:
[    2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe
[    3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
[    3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe
[    3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe

Even with no DMA action, the buffer is corrupted.

Regards

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-15 11:26   ` Corentin Labbe
@ 2022-04-16  2:19     ` Samuel Holland
  -1 siblings, 0 replies; 50+ messages in thread
From: Samuel Holland @ 2022-04-16  2:19 UTC (permalink / raw)
  To: Corentin Labbe, Heiko Stuebner
  Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren,
	atishp, anup, mick, cmuellner, philipp.tomsich

On 4/15/22 6:26 AM, Corentin Labbe wrote:
> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
>> This series is based on the alternatives changes done in my svpbmt series
>> and thus also depends on Atish's isa-extension parsing series.
>>
>> It implements using the cache-management instructions from the  Zicbom-
>> extension to handle cache flush, etc actions on platforms needing them.
>>
>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
>> different set of cache instructions. But while they are different,
>> instructions they provide the same functionality, so a variant can
>> easly hook into the existing alternatives mechanism on those.
>>
>>
> 
> Hello
> 
> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> 
> I am hitting a buffer corruption problem with DMA.
> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> In fact the buffer is not overran by device but by dma_map_single() operation.
> 
> The following small code show the problem:
> 
> dma_addr_t dma;
> u8 *buf;
> #define BSIZE 2048
> #define DMASIZE 16
> 
> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> for (i = 0; i < BSIZE; i++)
>     buf[i] = 0xFE;
> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);

This function (through dma_direct_map_page()) ends up calling
arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
cache. This is the same thing other architectures do (at least arm, arm64,
openrisc, and powerpc). So this appears to be working as intended.

Regards,
Samuel

> dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE);
> print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> 
> Will lead to:
> [    2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> 
> Even with no DMA action, the buffer is corrupted.
> 
> Regards
> 


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-16  2:19     ` Samuel Holland
  0 siblings, 0 replies; 50+ messages in thread
From: Samuel Holland @ 2022-04-16  2:19 UTC (permalink / raw)
  To: Corentin Labbe, Heiko Stuebner
  Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren,
	atishp, anup, mick, cmuellner, philipp.tomsich

On 4/15/22 6:26 AM, Corentin Labbe wrote:
> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
>> This series is based on the alternatives changes done in my svpbmt series
>> and thus also depends on Atish's isa-extension parsing series.
>>
>> It implements using the cache-management instructions from the  Zicbom-
>> extension to handle cache flush, etc actions on platforms needing them.
>>
>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
>> different set of cache instructions. But while they are different,
>> instructions they provide the same functionality, so a variant can
>> easly hook into the existing alternatives mechanism on those.
>>
>>
> 
> Hello
> 
> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> 
> I am hitting a buffer corruption problem with DMA.
> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> In fact the buffer is not overran by device but by dma_map_single() operation.
> 
> The following small code show the problem:
> 
> dma_addr_t dma;
> u8 *buf;
> #define BSIZE 2048
> #define DMASIZE 16
> 
> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> for (i = 0; i < BSIZE; i++)
>     buf[i] = 0xFE;
> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);

This function (through dma_direct_map_page()) ends up calling
arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
cache. This is the same thing other architectures do (at least arm, arm64,
openrisc, and powerpc). So this appears to be working as intended.

Regards,
Samuel

> dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE);
> print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> 
> Will lead to:
> [    2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe
> [    3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3
> [    3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> [    3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe
> 
> Even with no DMA action, the buffer is corrupted.
> 
> Regards
> 


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-16  2:19     ` Samuel Holland
@ 2022-04-16  7:35       ` Corentin Labbe
  -1 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-16  7:35 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel,
	wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich

Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> >> This series is based on the alternatives changes done in my svpbmt series
> >> and thus also depends on Atish's isa-extension parsing series.
> >>
> >> It implements using the cache-management instructions from the  Zicbom-
> >> extension to handle cache flush, etc actions on platforms needing them.
> >>
> >> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> >> different set of cache instructions. But while they are different,
> >> instructions they provide the same functionality, so a variant can
> >> easly hook into the existing alternatives mechanism on those.
> >>
> >>
> > 
> > Hello
> > 
> > I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > 
> > I am hitting a buffer corruption problem with DMA.
> > The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > In fact the buffer is not overran by device but by dma_map_single() operation.
> > 
> > The following small code show the problem:
> > 
> > dma_addr_t dma;
> > u8 *buf;
> > #define BSIZE 2048
> > #define DMASIZE 16
> > 
> > buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > for (i = 0; i < BSIZE; i++)
> >     buf[i] = 0xFE;
> > print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> 
> This function (through dma_direct_map_page()) ends up calling
> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> cache. This is the same thing other architectures do (at least arm, arm64,
> openrisc, and powerpc). So this appears to be working as intended.
> 
> Regards,
> Samuel
> 

This behavour is not present at least on ARM and ARM64.
The sample code I provided does not corrupt the buffer on them.

Regards

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-16  7:35       ` Corentin Labbe
  0 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-16  7:35 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel,
	wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich

Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> >> This series is based on the alternatives changes done in my svpbmt series
> >> and thus also depends on Atish's isa-extension parsing series.
> >>
> >> It implements using the cache-management instructions from the  Zicbom-
> >> extension to handle cache flush, etc actions on platforms needing them.
> >>
> >> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> >> different set of cache instructions. But while they are different,
> >> instructions they provide the same functionality, so a variant can
> >> easly hook into the existing alternatives mechanism on those.
> >>
> >>
> > 
> > Hello
> > 
> > I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > 
> > I am hitting a buffer corruption problem with DMA.
> > The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > In fact the buffer is not overran by device but by dma_map_single() operation.
> > 
> > The following small code show the problem:
> > 
> > dma_addr_t dma;
> > u8 *buf;
> > #define BSIZE 2048
> > #define DMASIZE 16
> > 
> > buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > for (i = 0; i < BSIZE; i++)
> >     buf[i] = 0xFE;
> > print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> 
> This function (through dma_direct_map_page()) ends up calling
> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> cache. This is the same thing other architectures do (at least arm, arm64,
> openrisc, and powerpc). So this appears to be working as intended.
> 
> Regards,
> Samuel
> 

This behavour is not present at least on ARM and ARM64.
The sample code I provided does not corrupt the buffer on them.

Regards

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-16  7:35       ` Corentin Labbe
@ 2022-04-16 17:47         ` Samuel Holland
  -1 siblings, 0 replies; 50+ messages in thread
From: Samuel Holland @ 2022-04-16 17:47 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel,
	wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich

On 4/16/22 2:35 AM, Corentin Labbe wrote:
> Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
>> On 4/15/22 6:26 AM, Corentin Labbe wrote:
>>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
>>>> This series is based on the alternatives changes done in my svpbmt series
>>>> and thus also depends on Atish's isa-extension parsing series.
>>>>
>>>> It implements using the cache-management instructions from the  Zicbom-
>>>> extension to handle cache flush, etc actions on platforms needing them.
>>>>
>>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
>>>> different set of cache instructions. But while they are different,
>>>> instructions they provide the same functionality, so a variant can
>>>> easly hook into the existing alternatives mechanism on those.
>>>>
>>>>
>>>
>>> Hello
>>>
>>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
>>>
>>> I am hitting a buffer corruption problem with DMA.
>>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
>>> In fact the buffer is not overran by device but by dma_map_single() operation.
>>>
>>> The following small code show the problem:
>>>
>>> dma_addr_t dma;
>>> u8 *buf;
>>> #define BSIZE 2048
>>> #define DMASIZE 16
>>>
>>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
>>> for (i = 0; i < BSIZE; i++)
>>>     buf[i] = 0xFE;
>>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
>>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
>>
>> This function (through dma_direct_map_page()) ends up calling
>> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
>> cache. This is the same thing other architectures do (at least arm, arm64,
>> openrisc, and powerpc). So this appears to be working as intended.
> 
> This behavour is not present at least on ARM and ARM64.
> The sample code I provided does not corrupt the buffer on them.

That can be explained by the 0xFE bytes having been flushed to DRAM already in
your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
a dirty cache line. The cache topology and implementation is totally different
across the SoCs, so this is not too surprising.

Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
unidirectional DMA transfer from the device into that buffer. So the contents of
the buffer are "undefined" until the DMA transfer completes. If you are also
writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.

Regards,
Samuel

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-16 17:47         ` Samuel Holland
  0 siblings, 0 replies; 50+ messages in thread
From: Samuel Holland @ 2022-04-16 17:47 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel,
	wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich

On 4/16/22 2:35 AM, Corentin Labbe wrote:
> Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
>> On 4/15/22 6:26 AM, Corentin Labbe wrote:
>>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
>>>> This series is based on the alternatives changes done in my svpbmt series
>>>> and thus also depends on Atish's isa-extension parsing series.
>>>>
>>>> It implements using the cache-management instructions from the  Zicbom-
>>>> extension to handle cache flush, etc actions on platforms needing them.
>>>>
>>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
>>>> different set of cache instructions. But while they are different,
>>>> instructions they provide the same functionality, so a variant can
>>>> easly hook into the existing alternatives mechanism on those.
>>>>
>>>>
>>>
>>> Hello
>>>
>>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
>>>
>>> I am hitting a buffer corruption problem with DMA.
>>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
>>> In fact the buffer is not overran by device but by dma_map_single() operation.
>>>
>>> The following small code show the problem:
>>>
>>> dma_addr_t dma;
>>> u8 *buf;
>>> #define BSIZE 2048
>>> #define DMASIZE 16
>>>
>>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
>>> for (i = 0; i < BSIZE; i++)
>>>     buf[i] = 0xFE;
>>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
>>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
>>
>> This function (through dma_direct_map_page()) ends up calling
>> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
>> cache. This is the same thing other architectures do (at least arm, arm64,
>> openrisc, and powerpc). So this appears to be working as intended.
> 
> This behavour is not present at least on ARM and ARM64.
> The sample code I provided does not corrupt the buffer on them.

That can be explained by the 0xFE bytes having been flushed to DRAM already in
your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
a dirty cache line. The cache topology and implementation is totally different
across the SoCs, so this is not too surprising.

Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
unidirectional DMA transfer from the device into that buffer. So the contents of
the buffer are "undefined" until the DMA transfer completes. If you are also
writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.

Regards,
Samuel

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-16 17:47         ` Samuel Holland
@ 2022-04-16 19:32           ` Corentin Labbe
  -1 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-16 19:32 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel,
	wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich,
	herbert, linux-crypto

Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> >>>> This series is based on the alternatives changes done in my svpbmt series
> >>>> and thus also depends on Atish's isa-extension parsing series.
> >>>>
> >>>> It implements using the cache-management instructions from the  Zicbom-
> >>>> extension to handle cache flush, etc actions on platforms needing them.
> >>>>
> >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> >>>> different set of cache instructions. But while they are different,
> >>>> instructions they provide the same functionality, so a variant can
> >>>> easly hook into the existing alternatives mechanism on those.
> >>>>
> >>>>
> >>>
> >>> Hello
> >>>
> >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> >>>
> >>> I am hitting a buffer corruption problem with DMA.
> >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> >>>
> >>> The following small code show the problem:
> >>>
> >>> dma_addr_t dma;
> >>> u8 *buf;
> >>> #define BSIZE 2048
> >>> #define DMASIZE 16
> >>>
> >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> >>> for (i = 0; i < BSIZE; i++)
> >>>     buf[i] = 0xFE;
> >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> >>
> >> This function (through dma_direct_map_page()) ends up calling
> >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> >> cache. This is the same thing other architectures do (at least arm, arm64,
> >> openrisc, and powerpc). So this appears to be working as intended.
> > 
> > This behavour is not present at least on ARM and ARM64.
> > The sample code I provided does not corrupt the buffer on them.
> 
> That can be explained by the 0xFE bytes having been flushed to DRAM already in
> your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> a dirty cache line. The cache topology and implementation is totally different
> across the SoCs, so this is not too surprising.
> 
> Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> unidirectional DMA transfer from the device into that buffer. So the contents of
> the buffer are "undefined" until the DMA transfer completes. If you are also
> writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> 
> Regards,
> Samuel

+CC crypto mailing list + maintainer

My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
concat a poison buffer to check that device does write beyond buffer.

But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.

So you mean that on SoC D1, this crypto API check strategy is impossible ?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-16 19:32           ` Corentin Labbe
  0 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-16 19:32 UTC (permalink / raw)
  To: Samuel Holland
  Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel,
	wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich,
	herbert, linux-crypto

Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> >>>> This series is based on the alternatives changes done in my svpbmt series
> >>>> and thus also depends on Atish's isa-extension parsing series.
> >>>>
> >>>> It implements using the cache-management instructions from the  Zicbom-
> >>>> extension to handle cache flush, etc actions on platforms needing them.
> >>>>
> >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> >>>> different set of cache instructions. But while they are different,
> >>>> instructions they provide the same functionality, so a variant can
> >>>> easly hook into the existing alternatives mechanism on those.
> >>>>
> >>>>
> >>>
> >>> Hello
> >>>
> >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> >>>
> >>> I am hitting a buffer corruption problem with DMA.
> >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> >>>
> >>> The following small code show the problem:
> >>>
> >>> dma_addr_t dma;
> >>> u8 *buf;
> >>> #define BSIZE 2048
> >>> #define DMASIZE 16
> >>>
> >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> >>> for (i = 0; i < BSIZE; i++)
> >>>     buf[i] = 0xFE;
> >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> >>
> >> This function (through dma_direct_map_page()) ends up calling
> >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> >> cache. This is the same thing other architectures do (at least arm, arm64,
> >> openrisc, and powerpc). So this appears to be working as intended.
> > 
> > This behavour is not present at least on ARM and ARM64.
> > The sample code I provided does not corrupt the buffer on them.
> 
> That can be explained by the 0xFE bytes having been flushed to DRAM already in
> your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> a dirty cache line. The cache topology and implementation is totally different
> across the SoCs, so this is not too surprising.
> 
> Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> unidirectional DMA transfer from the device into that buffer. So the contents of
> the buffer are "undefined" until the DMA transfer completes. If you are also
> writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> 
> Regards,
> Samuel

+CC crypto mailing list + maintainer

My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
concat a poison buffer to check that device does write beyond buffer.

But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.

So you mean that on SoC D1, this crypto API check strategy is impossible ?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-16 19:32           ` Corentin Labbe
@ 2022-04-17  2:17             ` Guo Ren
  -1 siblings, 0 replies; 50+ messages in thread
From: Guo Ren @ 2022-04-17  2:17 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
<clabbe.montjoie@gmail.com> wrote:
>
> Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > >>>> This series is based on the alternatives changes done in my svpbmt series
> > >>>> and thus also depends on Atish's isa-extension parsing series.
> > >>>>
> > >>>> It implements using the cache-management instructions from the  Zicbom-
> > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > >>>>
> > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > >>>> different set of cache instructions. But while they are different,
> > >>>> instructions they provide the same functionality, so a variant can
> > >>>> easly hook into the existing alternatives mechanism on those.
> > >>>>
> > >>>>
> > >>>
> > >>> Hello
> > >>>
> > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > >>>
> > >>> I am hitting a buffer corruption problem with DMA.
> > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > >>>
> > >>> The following small code show the problem:
> > >>>
> > >>> dma_addr_t dma;
> > >>> u8 *buf;
> > >>> #define BSIZE 2048
> > >>> #define DMASIZE 16
> > >>>
> > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > >>> for (i = 0; i < BSIZE; i++)
> > >>>     buf[i] = 0xFE;
> > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > >>
> > >> This function (through dma_direct_map_page()) ends up calling
> > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > >> openrisc, and powerpc). So this appears to be working as intended.
> > >
> > > This behavour is not present at least on ARM and ARM64.
> > > The sample code I provided does not corrupt the buffer on them.
> >
> > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > a dirty cache line. The cache topology and implementation is totally different
> > across the SoCs, so this is not too surprising.
> >
> > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > unidirectional DMA transfer from the device into that buffer. So the contents of
> > the buffer are "undefined" until the DMA transfer completes. If you are also
> > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> >
> > Regards,
> > Samuel
>
> +CC crypto mailing list + maintainer
>
> My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> concat a poison buffer to check that device does write beyond buffer.
>
> But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
>
> So you mean that on SoC D1, this crypto API check strategy is impossible ?

I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
for the testing. (All cache block-aligned data from the device for the
CPU should be invalided.)

+void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum
dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ case DMA_FROM_DEVICE:
+ ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ default:
+ break;
+ }
+}
+
+void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum
dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ break;
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ default:
+ break;
+ }
+}



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-17  2:17             ` Guo Ren
  0 siblings, 0 replies; 50+ messages in thread
From: Guo Ren @ 2022-04-17  2:17 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
<clabbe.montjoie@gmail.com> wrote:
>
> Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > >>>> This series is based on the alternatives changes done in my svpbmt series
> > >>>> and thus also depends on Atish's isa-extension parsing series.
> > >>>>
> > >>>> It implements using the cache-management instructions from the  Zicbom-
> > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > >>>>
> > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > >>>> different set of cache instructions. But while they are different,
> > >>>> instructions they provide the same functionality, so a variant can
> > >>>> easly hook into the existing alternatives mechanism on those.
> > >>>>
> > >>>>
> > >>>
> > >>> Hello
> > >>>
> > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > >>>
> > >>> I am hitting a buffer corruption problem with DMA.
> > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > >>>
> > >>> The following small code show the problem:
> > >>>
> > >>> dma_addr_t dma;
> > >>> u8 *buf;
> > >>> #define BSIZE 2048
> > >>> #define DMASIZE 16
> > >>>
> > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > >>> for (i = 0; i < BSIZE; i++)
> > >>>     buf[i] = 0xFE;
> > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > >>
> > >> This function (through dma_direct_map_page()) ends up calling
> > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > >> openrisc, and powerpc). So this appears to be working as intended.
> > >
> > > This behavour is not present at least on ARM and ARM64.
> > > The sample code I provided does not corrupt the buffer on them.
> >
> > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > a dirty cache line. The cache topology and implementation is totally different
> > across the SoCs, so this is not too surprising.
> >
> > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > unidirectional DMA transfer from the device into that buffer. So the contents of
> > the buffer are "undefined" until the DMA transfer completes. If you are also
> > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> >
> > Regards,
> > Samuel
>
> +CC crypto mailing list + maintainer
>
> My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> concat a poison buffer to check that device does write beyond buffer.
>
> But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
>
> So you mean that on SoC D1, this crypto API check strategy is impossible ?

I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
for the testing. (All cache block-aligned data from the device for the
CPU should be invalided.)

+void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum
dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ case DMA_FROM_DEVICE:
+ ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ default:
+ break;
+ }
+}
+
+void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum
dma_data_direction dir)
+{
+ switch (dir) {
+ case DMA_TO_DEVICE:
+ break;
+ case DMA_FROM_DEVICE:
+ case DMA_BIDIRECTIONAL:
+ ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+ break;
+ default:
+ break;
+ }
+}



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-17  2:17             ` Guo Ren
@ 2022-04-17  8:45               ` Corentin Labbe
  -1 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-17  8:45 UTC (permalink / raw)
  To: Guo Ren
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > >>>>
> > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > >>>>
> > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > >>>> different set of cache instructions. But while they are different,
> > > >>>> instructions they provide the same functionality, so a variant can
> > > >>>> easly hook into the existing alternatives mechanism on those.
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> Hello
> > > >>>
> > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > >>>
> > > >>> I am hitting a buffer corruption problem with DMA.
> > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > >>>
> > > >>> The following small code show the problem:
> > > >>>
> > > >>> dma_addr_t dma;
> > > >>> u8 *buf;
> > > >>> #define BSIZE 2048
> > > >>> #define DMASIZE 16
> > > >>>
> > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > >>> for (i = 0; i < BSIZE; i++)
> > > >>>     buf[i] = 0xFE;
> > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > >>
> > > >> This function (through dma_direct_map_page()) ends up calling
> > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > >
> > > > This behavour is not present at least on ARM and ARM64.
> > > > The sample code I provided does not corrupt the buffer on them.
> > >
> > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > a dirty cache line. The cache topology and implementation is totally different
> > > across the SoCs, so this is not too surprising.
> > >
> > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > >
> > > Regards,
> > > Samuel
> >
> > +CC crypto mailing list + maintainer
> >
> > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > concat a poison buffer to check that device does write beyond buffer.
> >
> > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> >
> > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> 
> I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> for the testing. (All cache block-aligned data from the device for the
> CPU should be invalided.)
> 

With:
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index 2c124bcc1932..608483522e05 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
                ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
                break;
        case DMA_FROM_DEVICE:
-               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
                break;
        case DMA_BIDIRECTIONAL:
                ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);


The crypto self test works and I got no more buffer corruption.

Thanks

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-17  8:45               ` Corentin Labbe
  0 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-17  8:45 UTC (permalink / raw)
  To: Guo Ren
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > >>>>
> > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > >>>>
> > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > >>>> different set of cache instructions. But while they are different,
> > > >>>> instructions they provide the same functionality, so a variant can
> > > >>>> easly hook into the existing alternatives mechanism on those.
> > > >>>>
> > > >>>>
> > > >>>
> > > >>> Hello
> > > >>>
> > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > >>>
> > > >>> I am hitting a buffer corruption problem with DMA.
> > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > >>>
> > > >>> The following small code show the problem:
> > > >>>
> > > >>> dma_addr_t dma;
> > > >>> u8 *buf;
> > > >>> #define BSIZE 2048
> > > >>> #define DMASIZE 16
> > > >>>
> > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > >>> for (i = 0; i < BSIZE; i++)
> > > >>>     buf[i] = 0xFE;
> > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > >>
> > > >> This function (through dma_direct_map_page()) ends up calling
> > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > >
> > > > This behavour is not present at least on ARM and ARM64.
> > > > The sample code I provided does not corrupt the buffer on them.
> > >
> > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > a dirty cache line. The cache topology and implementation is totally different
> > > across the SoCs, so this is not too surprising.
> > >
> > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > >
> > > Regards,
> > > Samuel
> >
> > +CC crypto mailing list + maintainer
> >
> > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > concat a poison buffer to check that device does write beyond buffer.
> >
> > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> >
> > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> 
> I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> for the testing. (All cache block-aligned data from the device for the
> CPU should be invalided.)
> 

With:
diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
index 2c124bcc1932..608483522e05 100644
--- a/arch/riscv/mm/dma-noncoherent.c
+++ b/arch/riscv/mm/dma-noncoherent.c
@@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
                ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
                break;
        case DMA_FROM_DEVICE:
-               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
+               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
                break;
        case DMA_BIDIRECTIONAL:
                ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);


The crypto self test works and I got no more buffer corruption.

Thanks

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-17  8:45               ` Corentin Labbe
@ 2022-04-17  8:49                 ` Guo Ren
  -1 siblings, 0 replies; 50+ messages in thread
From: Guo Ren @ 2022-04-17  8:49 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
<clabbe.montjoie@gmail.com> wrote:
>
> Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > <clabbe.montjoie@gmail.com> wrote:
> > >
> > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > >>>>
> > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > >>>>
> > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > >>>> different set of cache instructions. But while they are different,
> > > > >>>> instructions they provide the same functionality, so a variant can
> > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>> Hello
> > > > >>>
> > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > >>>
> > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > >>>
> > > > >>> The following small code show the problem:
> > > > >>>
> > > > >>> dma_addr_t dma;
> > > > >>> u8 *buf;
> > > > >>> #define BSIZE 2048
> > > > >>> #define DMASIZE 16
> > > > >>>
> > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > >>> for (i = 0; i < BSIZE; i++)
> > > > >>>     buf[i] = 0xFE;
> > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > >>
> > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > >
> > > > > This behavour is not present at least on ARM and ARM64.
> > > > > The sample code I provided does not corrupt the buffer on them.
> > > >
> > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > a dirty cache line. The cache topology and implementation is totally different
> > > > across the SoCs, so this is not too surprising.
> > > >
> > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > >
> > > > Regards,
> > > > Samuel
> > >
> > > +CC crypto mailing list + maintainer
> > >
> > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > concat a poison buffer to check that device does write beyond buffer.
> > >
> > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > >
> > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> >
> > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > for the testing. (All cache block-aligned data from the device for the
> > CPU should be invalided.)
> >
>
> With:
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> index 2c124bcc1932..608483522e05 100644
> --- a/arch/riscv/mm/dma-noncoherent.c
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
>                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
>                 break;
>         case DMA_FROM_DEVICE:
> -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
>                 break;
>         case DMA_BIDIRECTIONAL:
>                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
>
>
> The crypto self test works and I got no more buffer corruption.
No, No ... it's not a solution. That means your driver has a problem.
From device, we only need INVAL enough.

>
> Thanks



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-17  8:49                 ` Guo Ren
  0 siblings, 0 replies; 50+ messages in thread
From: Guo Ren @ 2022-04-17  8:49 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
<clabbe.montjoie@gmail.com> wrote:
>
> Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > <clabbe.montjoie@gmail.com> wrote:
> > >
> > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > >>>>
> > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > >>>>
> > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > >>>> different set of cache instructions. But while they are different,
> > > > >>>> instructions they provide the same functionality, so a variant can
> > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > >>>>
> > > > >>>>
> > > > >>>
> > > > >>> Hello
> > > > >>>
> > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > >>>
> > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > >>>
> > > > >>> The following small code show the problem:
> > > > >>>
> > > > >>> dma_addr_t dma;
> > > > >>> u8 *buf;
> > > > >>> #define BSIZE 2048
> > > > >>> #define DMASIZE 16
> > > > >>>
> > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > >>> for (i = 0; i < BSIZE; i++)
> > > > >>>     buf[i] = 0xFE;
> > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > >>
> > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > >
> > > > > This behavour is not present at least on ARM and ARM64.
> > > > > The sample code I provided does not corrupt the buffer on them.
> > > >
> > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > a dirty cache line. The cache topology and implementation is totally different
> > > > across the SoCs, so this is not too surprising.
> > > >
> > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > >
> > > > Regards,
> > > > Samuel
> > >
> > > +CC crypto mailing list + maintainer
> > >
> > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > concat a poison buffer to check that device does write beyond buffer.
> > >
> > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > >
> > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> >
> > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > for the testing. (All cache block-aligned data from the device for the
> > CPU should be invalided.)
> >
>
> With:
> diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> index 2c124bcc1932..608483522e05 100644
> --- a/arch/riscv/mm/dma-noncoherent.c
> +++ b/arch/riscv/mm/dma-noncoherent.c
> @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
>                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
>                 break;
>         case DMA_FROM_DEVICE:
> -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
>                 break;
>         case DMA_BIDIRECTIONAL:
>                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
>
>
> The crypto self test works and I got no more buffer corruption.
No, No ... it's not a solution. That means your driver has a problem.
From device, we only need INVAL enough.

>
> Thanks



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-17  8:49                 ` Guo Ren
@ 2022-04-17 17:35                   ` Corentin Labbe
  -1 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-17 17:35 UTC (permalink / raw)
  To: Guo Ren
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > <clabbe.montjoie@gmail.com> wrote:
> > > >
> > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > >>>>
> > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > >>>>
> > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > >>>> different set of cache instructions. But while they are different,
> > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>> Hello
> > > > > >>>
> > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > >>>
> > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > >>>
> > > > > >>> The following small code show the problem:
> > > > > >>>
> > > > > >>> dma_addr_t dma;
> > > > > >>> u8 *buf;
> > > > > >>> #define BSIZE 2048
> > > > > >>> #define DMASIZE 16
> > > > > >>>
> > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > >>>     buf[i] = 0xFE;
> > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > >>
> > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > >
> > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > >
> > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > across the SoCs, so this is not too surprising.
> > > > >
> > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > >
> > > > > Regards,
> > > > > Samuel
> > > >
> > > > +CC crypto mailing list + maintainer
> > > >
> > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > concat a poison buffer to check that device does write beyond buffer.
> > > >
> > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > >
> > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > >
> > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > for the testing. (All cache block-aligned data from the device for the
> > > CPU should be invalided.)
> > >
> >
> > With:
> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > index 2c124bcc1932..608483522e05 100644
> > --- a/arch/riscv/mm/dma-noncoherent.c
> > +++ b/arch/riscv/mm/dma-noncoherent.c
> > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> >                 break;
> >         case DMA_FROM_DEVICE:
> > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> >                 break;
> >         case DMA_BIDIRECTIONAL:
> >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> >
> >
> > The crypto self test works and I got no more buffer corruption.
> No, No ... it's not a solution. That means your driver has a problem.
> From device, we only need INVAL enough.
> 

For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.

Example:
crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).

Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
My driver write normally via DMA the first 16 bytes.
Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.

But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)

So the problem is the dma_map(FROM_DEVICE) which change buffer content.

So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-17 17:35                   ` Corentin Labbe
  0 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-17 17:35 UTC (permalink / raw)
  To: Guo Ren
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > <clabbe.montjoie@gmail.com> wrote:
> > > >
> > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > >>>>
> > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > >>>>
> > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > >>>> different set of cache instructions. But while they are different,
> > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > >>>>
> > > > > >>>>
> > > > > >>>
> > > > > >>> Hello
> > > > > >>>
> > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > >>>
> > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > >>>
> > > > > >>> The following small code show the problem:
> > > > > >>>
> > > > > >>> dma_addr_t dma;
> > > > > >>> u8 *buf;
> > > > > >>> #define BSIZE 2048
> > > > > >>> #define DMASIZE 16
> > > > > >>>
> > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > >>>     buf[i] = 0xFE;
> > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > >>
> > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > >
> > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > >
> > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > across the SoCs, so this is not too surprising.
> > > > >
> > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > >
> > > > > Regards,
> > > > > Samuel
> > > >
> > > > +CC crypto mailing list + maintainer
> > > >
> > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > concat a poison buffer to check that device does write beyond buffer.
> > > >
> > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > >
> > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > >
> > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > for the testing. (All cache block-aligned data from the device for the
> > > CPU should be invalided.)
> > >
> >
> > With:
> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > index 2c124bcc1932..608483522e05 100644
> > --- a/arch/riscv/mm/dma-noncoherent.c
> > +++ b/arch/riscv/mm/dma-noncoherent.c
> > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> >                 break;
> >         case DMA_FROM_DEVICE:
> > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> >                 break;
> >         case DMA_BIDIRECTIONAL:
> >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> >
> >
> > The crypto self test works and I got no more buffer corruption.
> No, No ... it's not a solution. That means your driver has a problem.
> From device, we only need INVAL enough.
> 

For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.

Example:
crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).

Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
My driver write normally via DMA the first 16 bytes.
Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.

But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)

So the problem is the dma_map(FROM_DEVICE) which change buffer content.

So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-17 17:35                   ` Corentin Labbe
@ 2022-04-17 22:50                     ` Guo Ren
  -1 siblings, 0 replies; 50+ messages in thread
From: Guo Ren @ 2022-04-17 22:50 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe
<clabbe.montjoie@gmail.com> wrote:
>
> Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > <clabbe.montjoie@gmail.com> wrote:
> > >
> > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > <clabbe.montjoie@gmail.com> wrote:
> > > > >
> > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > >>>>
> > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > >>>>
> > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Hello
> > > > > > >>>
> > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > >>>
> > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > >>>
> > > > > > >>> The following small code show the problem:
> > > > > > >>>
> > > > > > >>> dma_addr_t dma;
> > > > > > >>> u8 *buf;
> > > > > > >>> #define BSIZE 2048
> > > > > > >>> #define DMASIZE 16
> > > > > > >>>
> > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > >>>     buf[i] = 0xFE;
> > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > >>
> > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > >
> > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > >
> > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > across the SoCs, so this is not too surprising.
> > > > > >
> > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > >
> > > > > > Regards,
> > > > > > Samuel
> > > > >
> > > > > +CC crypto mailing list + maintainer
> > > > >
> > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > >
> > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > >
> > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > >
> > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > for the testing. (All cache block-aligned data from the device for the
> > > > CPU should be invalided.)
> > > >
> > >
> > > With:
> > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > index 2c124bcc1932..608483522e05 100644
> > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_FROM_DEVICE:
> > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_BIDIRECTIONAL:
> > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >
> > >
> > > The crypto self test works and I got no more buffer corruption.
> > No, No ... it's not a solution. That means your driver has a problem.
> > From device, we only need INVAL enough.
> >
>
> For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
>
> Example:
> crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
>
> Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
> My driver write normally via DMA the first 16 bytes.
> Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.
>
> But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)
>
> So the problem is the dma_map(FROM_DEVICE) which change buffer content.
>
> So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?
Actually, FLUSH is safe for all, but more expensive. Can you tell me
which arm SOC are you using? And which version of linux is running on
your arm SOC?


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-17 22:50                     ` Guo Ren
  0 siblings, 0 replies; 50+ messages in thread
From: Guo Ren @ 2022-04-17 22:50 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe
<clabbe.montjoie@gmail.com> wrote:
>
> Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > <clabbe.montjoie@gmail.com> wrote:
> > >
> > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > <clabbe.montjoie@gmail.com> wrote:
> > > > >
> > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > >>>>
> > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > >>>>
> > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Hello
> > > > > > >>>
> > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > >>>
> > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > >>>
> > > > > > >>> The following small code show the problem:
> > > > > > >>>
> > > > > > >>> dma_addr_t dma;
> > > > > > >>> u8 *buf;
> > > > > > >>> #define BSIZE 2048
> > > > > > >>> #define DMASIZE 16
> > > > > > >>>
> > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > >>>     buf[i] = 0xFE;
> > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > >>
> > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > >
> > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > >
> > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > across the SoCs, so this is not too surprising.
> > > > > >
> > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > >
> > > > > > Regards,
> > > > > > Samuel
> > > > >
> > > > > +CC crypto mailing list + maintainer
> > > > >
> > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > >
> > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > >
> > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > >
> > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > for the testing. (All cache block-aligned data from the device for the
> > > > CPU should be invalided.)
> > > >
> > >
> > > With:
> > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > index 2c124bcc1932..608483522e05 100644
> > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_FROM_DEVICE:
> > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_BIDIRECTIONAL:
> > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >
> > >
> > > The crypto self test works and I got no more buffer corruption.
> > No, No ... it's not a solution. That means your driver has a problem.
> > From device, we only need INVAL enough.
> >
>
> For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
>
> Example:
> crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
>
> Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
> My driver write normally via DMA the first 16 bytes.
> Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.
>
> But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)
>
> So the problem is the dma_map(FROM_DEVICE) which change buffer content.
>
> So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?
Actually, FLUSH is safe for all, but more expensive. Can you tell me
which arm SOC are you using? And which version of linux is running on
your arm SOC?


-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-17 17:35                   ` Corentin Labbe
@ 2022-04-18 15:29                     ` Philipp Tomsich
  -1 siblings, 0 replies; 50+ messages in thread
From: Philipp Tomsich @ 2022-04-18 15:29 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt,
	Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu,
	Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner,
	Herbert Xu, linux-crypto

On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote:
>
> Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > <clabbe.montjoie@gmail.com> wrote:
> > >
> > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > <clabbe.montjoie@gmail.com> wrote:
> > > > >
> > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > >>>>
> > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > >>>>
> > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Hello
> > > > > > >>>
> > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > >>>
> > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > >>>
> > > > > > >>> The following small code show the problem:
> > > > > > >>>
> > > > > > >>> dma_addr_t dma;
> > > > > > >>> u8 *buf;
> > > > > > >>> #define BSIZE 2048
> > > > > > >>> #define DMASIZE 16
> > > > > > >>>
> > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > >>>     buf[i] = 0xFE;
> > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > >>
> > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > >
> > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > >
> > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > across the SoCs, so this is not too surprising.
> > > > > >
> > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > >
> > > > > > Regards,
> > > > > > Samuel
> > > > >
> > > > > +CC crypto mailing list + maintainer
> > > > >
> > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > >
> > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > >
> > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > >
> > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > for the testing. (All cache block-aligned data from the device for the
> > > > CPU should be invalided.)
> > > >
> > >
> > > With:
> > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > index 2c124bcc1932..608483522e05 100644
> > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_FROM_DEVICE:
> > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_BIDIRECTIONAL:
> > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >
> > >
> > > The crypto self test works and I got no more buffer corruption.
> > No, No ... it's not a solution. That means your driver has a problem.
> > From device, we only need INVAL enough.
> >
>
> For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
>
> Example:
> crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
>
> Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.

Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes
from the CPU to the buffer (and that any modifications to the
underlying cache line can be dropped via an invalidation)?
In other words: does the behavior change when mapping as
DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it
is first mapped as DMA_TO_DEVICE when poisoning the buffer and later
as DMA_FROM_DEVICE when in normal operation?

Philipp.

> My driver write normally via DMA the first 16 bytes.
> Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.
>
> But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)
>
> So the problem is the dma_map(FROM_DEVICE) which change buffer content.
>
> So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-18 15:29                     ` Philipp Tomsich
  0 siblings, 0 replies; 50+ messages in thread
From: Philipp Tomsich @ 2022-04-18 15:29 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt,
	Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu,
	Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner,
	Herbert Xu, linux-crypto

On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote:
>
> Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > <clabbe.montjoie@gmail.com> wrote:
> > >
> > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > <clabbe.montjoie@gmail.com> wrote:
> > > > >
> > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > >>>>
> > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > >>>>
> > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > >>>>
> > > > > > >>>>
> > > > > > >>>
> > > > > > >>> Hello
> > > > > > >>>
> > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > >>>
> > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > >>>
> > > > > > >>> The following small code show the problem:
> > > > > > >>>
> > > > > > >>> dma_addr_t dma;
> > > > > > >>> u8 *buf;
> > > > > > >>> #define BSIZE 2048
> > > > > > >>> #define DMASIZE 16
> > > > > > >>>
> > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > >>>     buf[i] = 0xFE;
> > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > >>
> > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > >
> > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > >
> > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > across the SoCs, so this is not too surprising.
> > > > > >
> > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > >
> > > > > > Regards,
> > > > > > Samuel
> > > > >
> > > > > +CC crypto mailing list + maintainer
> > > > >
> > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > >
> > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > >
> > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > >
> > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > for the testing. (All cache block-aligned data from the device for the
> > > > CPU should be invalided.)
> > > >
> > >
> > > With:
> > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > index 2c124bcc1932..608483522e05 100644
> > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_FROM_DEVICE:
> > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >                 break;
> > >         case DMA_BIDIRECTIONAL:
> > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > >
> > >
> > > The crypto self test works and I got no more buffer corruption.
> > No, No ... it's not a solution. That means your driver has a problem.
> > From device, we only need INVAL enough.
> >
>
> For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
>
> Example:
> crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
>
> Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.

Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes
from the CPU to the buffer (and that any modifications to the
underlying cache line can be dropped via an invalidation)?
In other words: does the behavior change when mapping as
DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it
is first mapped as DMA_TO_DEVICE when poisoning the buffer and later
as DMA_FROM_DEVICE when in normal operation?

Philipp.

> My driver write normally via DMA the first 16 bytes.
> Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.
>
> But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)
>
> So the problem is the dma_map(FROM_DEVICE) which change buffer content.
>
> So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-17 22:50                     ` Guo Ren
@ 2022-04-19  7:44                       ` Corentin Labbe
  -1 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-19  7:44 UTC (permalink / raw)
  To: Guo Ren
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

Le Mon, Apr 18, 2022 at 06:50:57AM +0800, Guo Ren a écrit :
> On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe
> <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > > <clabbe.montjoie@gmail.com> wrote:
> > > >
> > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > > <clabbe.montjoie@gmail.com> wrote:
> > > > > >
> > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > > >>>>
> > > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > > >>>>
> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Hello
> > > > > > > >>>
> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > > >>>
> > > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > > >>>
> > > > > > > >>> The following small code show the problem:
> > > > > > > >>>
> > > > > > > >>> dma_addr_t dma;
> > > > > > > >>> u8 *buf;
> > > > > > > >>> #define BSIZE 2048
> > > > > > > >>> #define DMASIZE 16
> > > > > > > >>>
> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > > >>>     buf[i] = 0xFE;
> > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > > >>
> > > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > > >
> > > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > > >
> > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > > across the SoCs, so this is not too surprising.
> > > > > > >
> > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Samuel
> > > > > >
> > > > > > +CC crypto mailing list + maintainer
> > > > > >
> > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > > >
> > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > > >
> > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > > >
> > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > > for the testing. (All cache block-aligned data from the device for the
> > > > > CPU should be invalided.)
> > > > >
> > > >
> > > > With:
> > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > > index 2c124bcc1932..608483522e05 100644
> > > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_FROM_DEVICE:
> > > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_BIDIRECTIONAL:
> > > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >
> > > >
> > > > The crypto self test works and I got no more buffer corruption.
> > > No, No ... it's not a solution. That means your driver has a problem.
> > > From device, we only need INVAL enough.
> > >
> >
> > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
> >
> > Example:
> > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
> >
> > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
> > My driver write normally via DMA the first 16 bytes.
> > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.
> >
> > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)
> >
> > So the problem is the dma_map(FROM_DEVICE) which change buffer content.
> >
> > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?
> Actually, FLUSH is safe for all, but more expensive. Can you tell me
> which arm SOC are you using? And which version of linux is running on
> your arm SOC?
> 

The SOC is Allwinner D1 (RiscV).
I am testing linux from https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-19  7:44                       ` Corentin Labbe
  0 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-19  7:44 UTC (permalink / raw)
  To: Guo Ren
  Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley,
	linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra,
	Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich,
	Herbert Xu, linux-crypto

Le Mon, Apr 18, 2022 at 06:50:57AM +0800, Guo Ren a écrit :
> On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe
> <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > > <clabbe.montjoie@gmail.com> wrote:
> > > >
> > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > > <clabbe.montjoie@gmail.com> wrote:
> > > > > >
> > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > > >>>>
> > > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > > >>>>
> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Hello
> > > > > > > >>>
> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > > >>>
> > > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > > >>>
> > > > > > > >>> The following small code show the problem:
> > > > > > > >>>
> > > > > > > >>> dma_addr_t dma;
> > > > > > > >>> u8 *buf;
> > > > > > > >>> #define BSIZE 2048
> > > > > > > >>> #define DMASIZE 16
> > > > > > > >>>
> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > > >>>     buf[i] = 0xFE;
> > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > > >>
> > > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > > >
> > > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > > >
> > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > > across the SoCs, so this is not too surprising.
> > > > > > >
> > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Samuel
> > > > > >
> > > > > > +CC crypto mailing list + maintainer
> > > > > >
> > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > > >
> > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > > >
> > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > > >
> > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > > for the testing. (All cache block-aligned data from the device for the
> > > > > CPU should be invalided.)
> > > > >
> > > >
> > > > With:
> > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > > index 2c124bcc1932..608483522e05 100644
> > > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_FROM_DEVICE:
> > > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_BIDIRECTIONAL:
> > > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >
> > > >
> > > > The crypto self test works and I got no more buffer corruption.
> > > No, No ... it's not a solution. That means your driver has a problem.
> > > From device, we only need INVAL enough.
> > >
> >
> > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
> >
> > Example:
> > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
> >
> > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
> > My driver write normally via DMA the first 16 bytes.
> > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes.
> >
> > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap)
> >
> > So the problem is the dma_map(FROM_DEVICE) which change buffer content.
> >
> > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ?
> Actually, FLUSH is safe for all, but more expensive. Can you tell me
> which arm SOC are you using? And which version of linux is running on
> your arm SOC?
> 

The SOC is Allwinner D1 (RiscV).
I am testing linux from https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip


_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
  2022-04-18 15:29                     ` Philipp Tomsich
@ 2022-04-19  7:52                       ` Corentin Labbe
  -1 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-19  7:52 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt,
	Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu,
	Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner,
	Herbert Xu, linux-crypto

Le Mon, Apr 18, 2022 at 05:29:10PM +0200, Philipp Tomsich a écrit :
> On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > > <clabbe.montjoie@gmail.com> wrote:
> > > >
> > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > > <clabbe.montjoie@gmail.com> wrote:
> > > > > >
> > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > > >>>>
> > > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > > >>>>
> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Hello
> > > > > > > >>>
> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > > >>>
> > > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > > >>>
> > > > > > > >>> The following small code show the problem:
> > > > > > > >>>
> > > > > > > >>> dma_addr_t dma;
> > > > > > > >>> u8 *buf;
> > > > > > > >>> #define BSIZE 2048
> > > > > > > >>> #define DMASIZE 16
> > > > > > > >>>
> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > > >>>     buf[i] = 0xFE;
> > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > > >>
> > > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > > >
> > > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > > >
> > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > > across the SoCs, so this is not too surprising.
> > > > > > >
> > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Samuel
> > > > > >
> > > > > > +CC crypto mailing list + maintainer
> > > > > >
> > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > > >
> > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > > >
> > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > > >
> > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > > for the testing. (All cache block-aligned data from the device for the
> > > > > CPU should be invalided.)
> > > > >
> > > >
> > > > With:
> > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > > index 2c124bcc1932..608483522e05 100644
> > > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_FROM_DEVICE:
> > > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_BIDIRECTIONAL:
> > > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >
> > > >
> > > > The crypto self test works and I got no more buffer corruption.
> > > No, No ... it's not a solution. That means your driver has a problem.
> > > From device, we only need INVAL enough.
> > >
> >
> > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
> >
> > Example:
> > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
> >
> > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
> 
> Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes
> from the CPU to the buffer (and that any modifications to the
> underlying cache line can be dropped via an invalidation)?
> In other words: does the behavior change when mapping as
> DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it
> is first mapped as DMA_TO_DEVICE when poisoning the buffer and later
> as DMA_FROM_DEVICE when in normal operation?
> 

There are no cpu writes after the dma_map(FROM_DEVICE).
The buffer is initialized by the cryptoAPI before.
Furtheremore, the buffer corrupted is next to the buffer being mapped.

I verified the size of dma_map_sg() via some debug:
sun8i-ce 3040000.crypto: sun8i_ce_cipher_prepare ecb(aes) cryptlen=16
dma_direct_map_sg:483 SG0 len=16   <- dma_map TO_DEVICE
dma_direct_map_sg:483 SG0 len=16   <- dma_map FROM_DEVICE
need:a47ca9dd e0df4c86 a070af6e 91710dec 
have:a47ca9dd e0df4c86 a070af6e 91710dec
dump whole buffer:
over:a47ca9dd e0df4c86 a070af6e 91710dec
over:ec05e6f2 d542fb77 128b2059 5bf06986 < here we should have 0xFE
alg: skcipher: ecb-aes-sun8i-ce encryption overran dst buffer on test vector 1, cfg=\"random: use_finup src_divs=[<reimport>100.0%@+1604]\"


Note that I tried the following patch:
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4948201065cc..c5b945974441 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -19,6 +19,7 @@
 #include <crypto/aead.h>
 #include <crypto/hash.h>
 #include <crypto/skcipher.h>
+#include <linux/cacheflush.h>
 #include <linux/err.h>
 #include <linux/fips.h>
 #include <linux/module.h>
@@ -205,6 +206,7 @@ static void testmgr_free_buf(char *buf[XBUFSIZE])
 static inline void testmgr_poison(void *addr, size_t len)
 {
        memset(addr, TESTMGR_POISON_BYTE, len);
+       flush_icache_range(addr, addr + len);
 }
 
 /* Is the memory region still fully poisoned? */

This patch fixes the problem, but I am not sure this is the rigth way.
A DMA mapping operation corrupting buffer around seems not good.

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant
@ 2022-04-19  7:52                       ` Corentin Labbe
  0 siblings, 0 replies; 50+ messages in thread
From: Corentin Labbe @ 2022-04-19  7:52 UTC (permalink / raw)
  To: Philipp Tomsich
  Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt,
	Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu,
	Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner,
	Herbert Xu, linux-crypto

Le Mon, Apr 18, 2022 at 05:29:10PM +0200, Philipp Tomsich a écrit :
> On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote:
> >
> > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit :
> > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe
> > > <clabbe.montjoie@gmail.com> wrote:
> > > >
> > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit :
> > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe
> > > > > <clabbe.montjoie@gmail.com> wrote:
> > > > > >
> > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit :
> > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote:
> > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit :
> > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote:
> > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit :
> > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series
> > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series.
> > > > > > > >>>>
> > > > > > > >>>> It implements using the cache-management instructions from the  Zicbom-
> > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them.
> > > > > > > >>>>
> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a
> > > > > > > >>>> different set of cache instructions. But while they are different,
> > > > > > > >>>> instructions they provide the same functionality, so a variant can
> > > > > > > >>>> easly hook into the existing alternatives mechanism on those.
> > > > > > > >>>>
> > > > > > > >>>>
> > > > > > > >>>
> > > > > > > >>> Hello
> > > > > > > >>>
> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie.
> > > > > > > >>>
> > > > > > > >>> I am hitting a buffer corruption problem with DMA.
> > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer".
> > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation.
> > > > > > > >>>
> > > > > > > >>> The following small code show the problem:
> > > > > > > >>>
> > > > > > > >>> dma_addr_t dma;
> > > > > > > >>> u8 *buf;
> > > > > > > >>> #define BSIZE 2048
> > > > > > > >>> #define DMASIZE 16
> > > > > > > >>>
> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA);
> > > > > > > >>> for (i = 0; i < BSIZE; i++)
> > > > > > > >>>     buf[i] = 0xFE;
> > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false);
> > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE);
> > > > > > > >>
> > > > > > > >> This function (through dma_direct_map_page()) ends up calling
> > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's
> > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64,
> > > > > > > >> openrisc, and powerpc). So this appears to be working as intended.
> > > > > > > >
> > > > > > > > This behavour is not present at least on ARM and ARM64.
> > > > > > > > The sample code I provided does not corrupt the buffer on them.
> > > > > > >
> > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in
> > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in
> > > > > > > a dirty cache line. The cache topology and implementation is totally different
> > > > > > > across the SoCs, so this is not too surprising.
> > > > > > >
> > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a
> > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of
> > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also
> > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL.
> > > > > > >
> > > > > > > Regards,
> > > > > > > Samuel
> > > > > >
> > > > > > +CC crypto mailing list + maintainer
> > > > > >
> > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation,
> > > > > > concat a poison buffer to check that device does write beyond buffer.
> > > > > >
> > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun.
> > > > > >
> > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ?
> > > > >
> > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops
> > > > > for the testing. (All cache block-aligned data from the device for the
> > > > > CPU should be invalided.)
> > > > >
> > > >
> > > > With:
> > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c
> > > > index 2c124bcc1932..608483522e05 100644
> > > > --- a/arch/riscv/mm/dma-noncoherent.c
> > > > +++ b/arch/riscv/mm/dma-noncoherent.c
> > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire
> > > >                 ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_FROM_DEVICE:
> > > > -               ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size);
> > > > +               ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >                 break;
> > > >         case DMA_BIDIRECTIONAL:
> > > >                 ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size);
> > > >
> > > >
> > > > The crypto self test works and I got no more buffer corruption.
> > > No, No ... it's not a solution. That means your driver has a problem.
> > > From device, we only need INVAL enough.
> > >
> >
> > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart.
> >
> > Example:
> > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example).
> > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length).
> >
> > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer.
> 
> Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes
> from the CPU to the buffer (and that any modifications to the
> underlying cache line can be dropped via an invalidation)?
> In other words: does the behavior change when mapping as
> DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it
> is first mapped as DMA_TO_DEVICE when poisoning the buffer and later
> as DMA_FROM_DEVICE when in normal operation?
> 

There are no cpu writes after the dma_map(FROM_DEVICE).
The buffer is initialized by the cryptoAPI before.
Furtheremore, the buffer corrupted is next to the buffer being mapped.

I verified the size of dma_map_sg() via some debug:
sun8i-ce 3040000.crypto: sun8i_ce_cipher_prepare ecb(aes) cryptlen=16
dma_direct_map_sg:483 SG0 len=16   <- dma_map TO_DEVICE
dma_direct_map_sg:483 SG0 len=16   <- dma_map FROM_DEVICE
need:a47ca9dd e0df4c86 a070af6e 91710dec 
have:a47ca9dd e0df4c86 a070af6e 91710dec
dump whole buffer:
over:a47ca9dd e0df4c86 a070af6e 91710dec
over:ec05e6f2 d542fb77 128b2059 5bf06986 < here we should have 0xFE
alg: skcipher: ecb-aes-sun8i-ce encryption overran dst buffer on test vector 1, cfg=\"random: use_finup src_divs=[<reimport>100.0%@+1604]\"


Note that I tried the following patch:
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 4948201065cc..c5b945974441 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -19,6 +19,7 @@
 #include <crypto/aead.h>
 #include <crypto/hash.h>
 #include <crypto/skcipher.h>
+#include <linux/cacheflush.h>
 #include <linux/err.h>
 #include <linux/fips.h>
 #include <linux/module.h>
@@ -205,6 +206,7 @@ static void testmgr_free_buf(char *buf[XBUFSIZE])
 static inline void testmgr_poison(void *addr, size_t len)
 {
        memset(addr, TESTMGR_POISON_BYTE, len);
+       flush_icache_range(addr, addr + len);
 }
 
 /* Is the memory region still fully poisoned? */

This patch fixes the problem, but I am not sure this is the rigth way.
A DMA mapping operation corrupting buffer around seems not good.

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
  2022-03-31  8:22       ` Heiko Stübner
@ 2022-04-20  0:18         ` Palmer Dabbelt
  -1 siblings, 0 replies; 50+ messages in thread
From: Palmer Dabbelt @ 2022-04-20  0:18 UTC (permalink / raw)
  To: heiko
  Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp,
	anup, mick, samuel, cmuellner, philipp.tomsich

On Thu, 31 Mar 2022 01:22:29 PDT (-0700), heiko@sntech.de wrote:
> Hi Palmer,
>
> Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt:
>> On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
>> > The T-Head C906 and C910 implement a scheme for handling
>> > cache operations different from the generic Zicbom extension.
>> >
>> > Add an errata for it next to the generic dma coherency ops.
>> >
>> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
>> > ---
>> >  arch/riscv/Kconfig.erratas           | 10 +++++++
>> >  arch/riscv/errata/thead/errata.c     |  5 ++++
>> >  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
>> >  3 files changed, 57 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
>> > index de4002baa1d0..89a6dcb8ac2a 100644
>> > --- a/arch/riscv/Kconfig.erratas
>> > +++ b/arch/riscv/Kconfig.erratas
>> > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
>> >
>> >  	  If you don't know what to do here, say "Y".
>> >
>> > +config ERRATA_THEAD_CMO
>> > +	bool "Apply T-Head cache management errata"
>> > +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
>> > +	default y
>> > +	help
>> > +	  This will apply the cache management errata to handle the
>> > +	  non-standard handling on non-coherent operations on T-Head SoCs.
>> > +
>> > +	  If you don't know what to do here, say "Y".
>> > +
>> >  endmenu
>> > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
>> > index fd8e0538a3f0..11c26c37425f 100644
>> > --- a/arch/riscv/errata/thead/errata.c
>> > +++ b/arch/riscv/errata/thead/errata.c
>> > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
>> >  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
>> >  		.check_func = errata_mt_check_func
>> >  	},
>> > +	{
>> > +		.name = "cache-management",
>> > +		.stage = RISCV_ALTERNATIVES_BOOT,
>> > +		.check_func = errata_mt_check_func
>> > +	},
>> >  };
>> >
>> >  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
>> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
>> > index 7a2dd61af24d..f7c6805daeab 100644
>> > --- a/arch/riscv/include/asm/errata_list.h
>> > +++ b/arch/riscv/include/asm/errata_list.h
>> > @@ -16,7 +16,8 @@
>> >
>> >  #ifdef CONFIG_ERRATA_THEAD
>> >  #define	ERRATA_THEAD_PBMT 0
>> > -#define	ERRATA_THEAD_NUMBER 1
>> > +#define	ERRATA_THEAD_CMO 1
>> > +#define	ERRATA_THEAD_NUMBER 2
>> >  #endif
>> >
>> >  #define	CPUFEATURE_SVPBMT 0
>> > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
>> >  #define CBO_CLEAN_A0	".long 0x25200F"
>> >  #define CBO_FLUSH_A0	".long 0x05200F"
>> >
>> > +/*
>> > + * dcache.ipa rs1 (invalidate, physical address)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000001    01010      rs1       000      00000  0001011
>> > + * dache.iva rs1 (invalida, virtual address)
>> > + *   0000001    00110      rs1       000      00000  0001011
>> > + *
>> > + * dcache.cpa rs1 (clean, physical address)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000001    01001      rs1       000      00000  0001011
>> > + * dcache.cva rs1 (clean, virtual address)
>> > + *   0000001    00100      rs1       000      00000  0001011
>> > + *
>> > + * dcache.cipa rs1 (clean then invalidate, physical address)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000001    01011      rs1       000      00000  0001011
>> > + * dcache.civa rs1 (... virtual address)
>> > + *   0000001    00111      rs1       000      00000  0001011
>> > + *
>> > + * sync.s (make sure all cache operations finished)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000000    11001     00000      000      00000  0001011
>> > + */
>> > +#define THEAD_INVAL_A0	".long 0x0265000b"
>> > +#define THEAD_CLEAN_A0	".long 0x0245000b"
>> > +#define THEAD_FLUSH_A0	".long 0x0275000b"
>> > +#define THEAD_SYNC_S	".long 0x0190000b"
>> 
>> IIRC this came up before, but these really need to get into the 
>> assembler as actual instructions.
>
> okay :-) .
>
> But just for my understanding which of the two ways going forward:
> - keep this in the waiting area _until_ a suitable binutils is released
> - use the coded instructions now and convert later once binutils is released
>
> The reason I ask is, that any chip with a t-head core like the Allwinner-D1
> will need this for things like basic networking, so with the binutils
> release schedule, I guess we'd be looking at autumn 2022 at the earliest.

I'm not the binutils release maintainer, so I can't really sign off on a 
release date, but give the history that sounds about right to me.  I get 
it's a headache to have to have a toolchain that supports the ISA, but 
if it was really that important it would have made one of the last two 
releases -- I very specifically remember talking to the folks at the 
RISC-V foundation about this the better part of a year ago, but they 
decided to play at politics instead of being constructive so now we have 
two messes to clean up.

I volunteered Patrick to send binutils patches for the T-Head cache 
control stuff (as I didn't have time to write it myself this weekend), 
it's only a dozen or so instructions and thus shouldn't take that long.  
At least that way we can get a rough consensus on how we're going to 
move forward with the toolchain support, which we really need before 
we're going to start depending on anything.

Sorry you got pulled into all this. 

> Thanks
> Heiko
>
>> > +
>> >  #define ALT_CMO_OP(_op, _start, _size)							\
>> > -asm volatile(ALTERNATIVE(								\
>> > +asm volatile(ALTERNATIVE_2(								\
>> > +	"nop\n\t"									\
>> >  	"nop\n\t"									\
>> >  	"nop\n\t"									\
>> >  	"nop\n\t"									\
>> > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
>> >  	CBO_##_op##_A0 "\n\t"								\
>> >  	"addi a0, a0, %0\n\t"								\
>> >  	"2:\n\t"									\
>> > -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
>> > +	"bltu a0, %2, 3b\n\t"								\
>> > +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
>> > +	"mv a0, %1\n\t"									\
>> > +	"j 2f\n\t"									\
>> > +	"3:\n\t"									\
>> > +	THEAD_##_op##_A0 "\n\t"								\
>> > +	"addi a0, a0, %0\n\t"								\
>> > +	"2:\n\t"									\
>> > +	"bltu a0, %2, 3b\n\t"								\
>> > +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
>> >  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
>> >  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs
@ 2022-04-20  0:18         ` Palmer Dabbelt
  0 siblings, 0 replies; 50+ messages in thread
From: Palmer Dabbelt @ 2022-04-20  0:18 UTC (permalink / raw)
  To: heiko
  Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp,
	anup, mick, samuel, cmuellner, philipp.tomsich

On Thu, 31 Mar 2022 01:22:29 PDT (-0700), heiko@sntech.de wrote:
> Hi Palmer,
>
> Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt:
>> On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote:
>> > The T-Head C906 and C910 implement a scheme for handling
>> > cache operations different from the generic Zicbom extension.
>> >
>> > Add an errata for it next to the generic dma coherency ops.
>> >
>> > Signed-off-by: Heiko Stuebner <heiko@sntech.de>
>> > ---
>> >  arch/riscv/Kconfig.erratas           | 10 +++++++
>> >  arch/riscv/errata/thead/errata.c     |  5 ++++
>> >  arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++--
>> >  3 files changed, 57 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas
>> > index de4002baa1d0..89a6dcb8ac2a 100644
>> > --- a/arch/riscv/Kconfig.erratas
>> > +++ b/arch/riscv/Kconfig.erratas
>> > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT
>> >
>> >  	  If you don't know what to do here, say "Y".
>> >
>> > +config ERRATA_THEAD_CMO
>> > +	bool "Apply T-Head cache management errata"
>> > +	depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT
>> > +	default y
>> > +	help
>> > +	  This will apply the cache management errata to handle the
>> > +	  non-standard handling on non-coherent operations on T-Head SoCs.
>> > +
>> > +	  If you don't know what to do here, say "Y".
>> > +
>> >  endmenu
>> > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c
>> > index fd8e0538a3f0..11c26c37425f 100644
>> > --- a/arch/riscv/errata/thead/errata.c
>> > +++ b/arch/riscv/errata/thead/errata.c
>> > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = {
>> >  		.stage = RISCV_ALTERNATIVES_EARLY_BOOT,
>> >  		.check_func = errata_mt_check_func
>> >  	},
>> > +	{
>> > +		.name = "cache-management",
>> > +		.stage = RISCV_ALTERNATIVES_BOOT,
>> > +		.check_func = errata_mt_check_func
>> > +	},
>> >  };
>> >
>> >  static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid)
>> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h
>> > index 7a2dd61af24d..f7c6805daeab 100644
>> > --- a/arch/riscv/include/asm/errata_list.h
>> > +++ b/arch/riscv/include/asm/errata_list.h
>> > @@ -16,7 +16,8 @@
>> >
>> >  #ifdef CONFIG_ERRATA_THEAD
>> >  #define	ERRATA_THEAD_PBMT 0
>> > -#define	ERRATA_THEAD_NUMBER 1
>> > +#define	ERRATA_THEAD_CMO 1
>> > +#define	ERRATA_THEAD_NUMBER 2
>> >  #endif
>> >
>> >  #define	CPUFEATURE_SVPBMT 0
>> > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE(								\
>> >  #define CBO_CLEAN_A0	".long 0x25200F"
>> >  #define CBO_FLUSH_A0	".long 0x05200F"
>> >
>> > +/*
>> > + * dcache.ipa rs1 (invalidate, physical address)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000001    01010      rs1       000      00000  0001011
>> > + * dache.iva rs1 (invalida, virtual address)
>> > + *   0000001    00110      rs1       000      00000  0001011
>> > + *
>> > + * dcache.cpa rs1 (clean, physical address)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000001    01001      rs1       000      00000  0001011
>> > + * dcache.cva rs1 (clean, virtual address)
>> > + *   0000001    00100      rs1       000      00000  0001011
>> > + *
>> > + * dcache.cipa rs1 (clean then invalidate, physical address)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000001    01011      rs1       000      00000  0001011
>> > + * dcache.civa rs1 (... virtual address)
>> > + *   0000001    00111      rs1       000      00000  0001011
>> > + *
>> > + * sync.s (make sure all cache operations finished)
>> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 |
>> > + *   0000000    11001     00000      000      00000  0001011
>> > + */
>> > +#define THEAD_INVAL_A0	".long 0x0265000b"
>> > +#define THEAD_CLEAN_A0	".long 0x0245000b"
>> > +#define THEAD_FLUSH_A0	".long 0x0275000b"
>> > +#define THEAD_SYNC_S	".long 0x0190000b"
>> 
>> IIRC this came up before, but these really need to get into the 
>> assembler as actual instructions.
>
> okay :-) .
>
> But just for my understanding which of the two ways going forward:
> - keep this in the waiting area _until_ a suitable binutils is released
> - use the coded instructions now and convert later once binutils is released
>
> The reason I ask is, that any chip with a t-head core like the Allwinner-D1
> will need this for things like basic networking, so with the binutils
> release schedule, I guess we'd be looking at autumn 2022 at the earliest.

I'm not the binutils release maintainer, so I can't really sign off on a 
release date, but give the history that sounds about right to me.  I get 
it's a headache to have to have a toolchain that supports the ISA, but 
if it was really that important it would have made one of the last two 
releases -- I very specifically remember talking to the folks at the 
RISC-V foundation about this the better part of a year ago, but they 
decided to play at politics instead of being constructive so now we have 
two messes to clean up.

I volunteered Patrick to send binutils patches for the T-Head cache 
control stuff (as I didn't have time to write it myself this weekend), 
it's only a dozen or so instructions and thus shouldn't take that long.  
At least that way we can get a rough consensus on how we're going to 
move forward with the toolchain support, which we really need before 
we're going to start depending on anything.

Sorry you got pulled into all this. 

> Thanks
> Heiko
>
>> > +
>> >  #define ALT_CMO_OP(_op, _start, _size)							\
>> > -asm volatile(ALTERNATIVE(								\
>> > +asm volatile(ALTERNATIVE_2(								\
>> > +	"nop\n\t"									\
>> >  	"nop\n\t"									\
>> >  	"nop\n\t"									\
>> >  	"nop\n\t"									\
>> > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE(								\
>> >  	CBO_##_op##_A0 "\n\t"								\
>> >  	"addi a0, a0, %0\n\t"								\
>> >  	"2:\n\t"									\
>> > -	"bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT)		\
>> > +	"bltu a0, %2, 3b\n\t"								\
>> > +	"nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT,				\
>> > +	"mv a0, %1\n\t"									\
>> > +	"j 2f\n\t"									\
>> > +	"3:\n\t"									\
>> > +	THEAD_##_op##_A0 "\n\t"								\
>> > +	"addi a0, a0, %0\n\t"								\
>> > +	"2:\n\t"									\
>> > +	"bltu a0, %2, 3b\n\t"								\
>> > +	THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO)	\
>> >  	: : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)),			\
>> >  	    "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES)))
>> 

_______________________________________________
linux-riscv mailing list
linux-riscv@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-riscv

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2022-04-20  0:19 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-07 22:46 [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant Heiko Stuebner
2022-03-07 22:46 ` Heiko Stuebner
2022-03-07 22:46 ` [PATCH 1/2] riscv: Implement Zicbom-based cache management operations Heiko Stuebner
2022-03-07 22:46   ` Heiko Stuebner
2022-03-25 16:20   ` Anup Patel
2022-03-25 16:20     ` Anup Patel
2022-03-25 17:24     ` Philipp Tomsich
2022-03-25 17:24       ` Philipp Tomsich
     [not found]     ` <CAAeLtUAi+61Hk7oBW979QEKYaume3vqdt_KkS_mXpRAs+CzHnA@mail.gmail.com>
2022-03-25 17:37       ` Anup Patel
2022-03-25 17:37         ` Anup Patel
2022-03-31 10:07   ` Christoph Hellwig
2022-03-31 10:07     ` Christoph Hellwig
2022-03-07 22:46 ` [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs Heiko Stuebner
2022-03-07 22:46   ` Heiko Stuebner
2022-03-31  2:30   ` Palmer Dabbelt
2022-03-31  2:30     ` Palmer Dabbelt
2022-03-31  8:22     ` Heiko Stübner
2022-03-31  8:22       ` Heiko Stübner
2022-03-31  8:29       ` Philipp Tomsich
2022-03-31  8:29         ` Philipp Tomsich
2022-04-20  0:18       ` Palmer Dabbelt
2022-04-20  0:18         ` Palmer Dabbelt
2022-04-01  1:05   ` Samuel Holland
2022-04-01  1:05     ` Samuel Holland
2022-04-15 11:26 ` [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant Corentin Labbe
2022-04-15 11:26   ` Corentin Labbe
2022-04-16  2:19   ` Samuel Holland
2022-04-16  2:19     ` Samuel Holland
2022-04-16  7:35     ` Corentin Labbe
2022-04-16  7:35       ` Corentin Labbe
2022-04-16 17:47       ` Samuel Holland
2022-04-16 17:47         ` Samuel Holland
2022-04-16 19:32         ` Corentin Labbe
2022-04-16 19:32           ` Corentin Labbe
2022-04-17  2:17           ` Guo Ren
2022-04-17  2:17             ` Guo Ren
2022-04-17  8:45             ` Corentin Labbe
2022-04-17  8:45               ` Corentin Labbe
2022-04-17  8:49               ` Guo Ren
2022-04-17  8:49                 ` Guo Ren
2022-04-17 17:35                 ` Corentin Labbe
2022-04-17 17:35                   ` Corentin Labbe
2022-04-17 22:50                   ` Guo Ren
2022-04-17 22:50                     ` Guo Ren
2022-04-19  7:44                     ` Corentin Labbe
2022-04-19  7:44                       ` Corentin Labbe
2022-04-18 15:29                   ` Philipp Tomsich
2022-04-18 15:29                     ` Philipp Tomsich
2022-04-19  7:52                     ` Corentin Labbe
2022-04-19  7:52                       ` Corentin Labbe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.