* [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-03-07 22:46 ` Heiko Stuebner 0 siblings, 0 replies; 50+ messages in thread From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw) To: palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Heiko Stuebner This series is based on the alternatives changes done in my svpbmt series and thus also depends on Atish's isa-extension parsing series. It implements using the cache-management instructions from the Zicbom- extension to handle cache flush, etc actions on platforms needing them. SoCs using cpu cores from T-Head like the Allwinne D1 implement a different set of cache instructions. But while they are different, instructions they provide the same functionality, so a variant can easly hook into the existing alternatives mechanism on those. Heiko Stuebner (2): riscv: Implement Zicbom-based cache management operations riscv: implement cache-management errata for T-Head SoCs arch/riscv/Kconfig | 8 +++ arch/riscv/Kconfig.erratas | 10 ++++ arch/riscv/errata/thead/errata.c | 5 ++ arch/riscv/include/asm/errata_list.h | 78 +++++++++++++++++++++++++++- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 17 ++++++ arch/riscv/mm/Makefile | 1 + arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++ 9 files changed, 180 insertions(+), 2 deletions(-) create mode 100644 arch/riscv/mm/dma-noncoherent.c -- 2.30.2 ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-03-07 22:46 ` Heiko Stuebner 0 siblings, 0 replies; 50+ messages in thread From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw) To: palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Heiko Stuebner This series is based on the alternatives changes done in my svpbmt series and thus also depends on Atish's isa-extension parsing series. It implements using the cache-management instructions from the Zicbom- extension to handle cache flush, etc actions on platforms needing them. SoCs using cpu cores from T-Head like the Allwinne D1 implement a different set of cache instructions. But while they are different, instructions they provide the same functionality, so a variant can easly hook into the existing alternatives mechanism on those. Heiko Stuebner (2): riscv: Implement Zicbom-based cache management operations riscv: implement cache-management errata for T-Head SoCs arch/riscv/Kconfig | 8 +++ arch/riscv/Kconfig.erratas | 10 ++++ arch/riscv/errata/thead/errata.c | 5 ++ arch/riscv/include/asm/errata_list.h | 78 +++++++++++++++++++++++++++- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 17 ++++++ arch/riscv/mm/Makefile | 1 + arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++ 9 files changed, 180 insertions(+), 2 deletions(-) create mode 100644 arch/riscv/mm/dma-noncoherent.c -- 2.30.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 1/2] riscv: Implement Zicbom-based cache management operations 2022-03-07 22:46 ` Heiko Stuebner @ 2022-03-07 22:46 ` Heiko Stuebner -1 siblings, 0 replies; 50+ messages in thread From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw) To: palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Heiko Stuebner, Christoph Hellwig, Atish Patra The Zicbom ISA-extension was ratified in november 2021 and introduces instructions for dcache invalidate, clean and flush operations. Implement cache management operations based on them. Of course not all cores will support this, so implement an alternative-based mechanism that replaces empty instructions with ones done around Zicbom instructions. We're using prebuild instructions for the Zicbom instructions for now, to not require a bleeding-edge compiler (gcc-12) for these somewhat simple instructions. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Atish Patra <atish.patra@wdc.com> Cc: Guo Ren <guoren@kernel.org> --- arch/riscv/Kconfig | 8 ++++ arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 17 ++++++++ arch/riscv/mm/Makefile | 1 + arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ 7 files changed, 125 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/mm/dma-noncoherent.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 5adcbd9b5e88..d3a1cd41c203 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -208,6 +208,14 @@ config PGTABLE_LEVELS config LOCKDEP_SUPPORT def_bool y +config RISCV_DMA_NONCOHERENT + bool "Support non-coherent dma operation" + select ARCH_HAS_DMA_PREP_COHERENT + select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SETUP_DMA_OPS + select DMA_DIRECT_REMAP + source "arch/riscv/Kconfig.socs" source "arch/riscv/Kconfig.erratas" diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h index 4fac46b82c16..7a2dd61af24d 100644 --- a/arch/riscv/include/asm/errata_list.h +++ b/arch/riscv/include/asm/errata_list.h @@ -20,7 +20,8 @@ #endif #define CPUFEATURE_SVPBMT 0 -#define CPUFEATURE_NUMBER 1 +#define CPUFEATURE_CMO 1 +#define CPUFEATURE_NUMBER 2 #ifdef __ASSEMBLY__ @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ #define ALT_THEAD_PMA(_val) #endif +/* + * cbo.clean rs1 + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0...01 rs1 010 00000 0001111 + * + * cbo.flush rs1 + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0...10 rs1 010 00000 0001111 + * + * cbo.inval rs1 + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0...00 rs1 010 00000 0001111 + */ +#define CBO_INVAL_A0 ".long 0x15200F" +#define CBO_CLEAN_A0 ".long 0x25200F" +#define CBO_FLUSH_A0 ".long 0x05200F" + +#define ALT_CMO_OP(_op, _start, _size) \ +asm volatile(ALTERNATIVE( \ + "nop\n\t" \ + "nop\n\t" \ + "nop\n\t" \ + "nop\n\t" \ + "nop", \ + "mv a0, %1\n\t" \ + "j 2f\n\t" \ + "3:\n\t" \ + CBO_##_op##_A0 "\n\t" \ + "addi a0, a0, %0\n\t" \ + "2:\n\t" \ + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) + #endif /* __ASSEMBLY__ */ #endif diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 656cd626eb1a..5943d5125a51 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; */ enum riscv_isa_ext_id { RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, + RISCV_ISA_EXT_ZICBOM, RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, }; diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index c582d557e555..dfcf592273a7 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), }; diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 5c5e6e7488ce..0e997fa5524a 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) set_bit(*ext - 'a', this_isa); } else { SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); } #undef SET_ISA_EXT_MAP } @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) return ret; } +static bool cpufeature_cmo_check_func(unsigned int stage) +{ + switch (stage) { + case RISCV_ALTERNATIVES_EARLY_BOOT: + return false; + default: + return riscv_isa_extension_available(NULL, ZICBOM); + } + + return false; +} + static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { { .name = "svpbmt", .check_func = cpufeature_svpbmt_check_func }, + { + .name = "cmo", + .check_func = cpufeature_cmo_check_func + }, }; static u32 __init_or_module cpufeature_probe(unsigned int stage) diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile index ac7a25298a04..d76aabf4b94d 100644 --- a/arch/riscv/mm/Makefile +++ b/arch/riscv/mm/Makefile @@ -30,3 +30,4 @@ endif endif obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c new file mode 100644 index 000000000000..2c124bcc1932 --- /dev/null +++ b/arch/riscv/mm/dma-noncoherent.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * RISC-V specific functions to support DMA for non-coherent devices + * + * Copyright (c) 2021 Western Digital Corporation or its affiliates. + */ + +#include <linux/dma-direct.h> +#include <linux/dma-map-ops.h> +#include <linux/init.h> +#include <linux/io.h> +#include <linux/libfdt.h> +#include <linux/mm.h> +#include <linux/of.h> +#include <linux/of_fdt.h> + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_FROM_DEVICE: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} + +void arch_dma_prep_coherent(struct page *page, size_t size) +{ + void *flush_addr = page_address(page); + + memset(flush_addr, 0, size); + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); +} + +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, + const struct iommu_ops *iommu, bool coherent) +{ + /* If a specific device is dma-coherent, set it here */ + dev->dma_coherent = coherent; +} -- 2.30.2 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 1/2] riscv: Implement Zicbom-based cache management operations @ 2022-03-07 22:46 ` Heiko Stuebner 0 siblings, 0 replies; 50+ messages in thread From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw) To: palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Heiko Stuebner, Christoph Hellwig, Atish Patra The Zicbom ISA-extension was ratified in november 2021 and introduces instructions for dcache invalidate, clean and flush operations. Implement cache management operations based on them. Of course not all cores will support this, so implement an alternative-based mechanism that replaces empty instructions with ones done around Zicbom instructions. We're using prebuild instructions for the Zicbom instructions for now, to not require a bleeding-edge compiler (gcc-12) for these somewhat simple instructions. Signed-off-by: Heiko Stuebner <heiko@sntech.de> Cc: Christoph Hellwig <hch@lst.de> Cc: Atish Patra <atish.patra@wdc.com> Cc: Guo Ren <guoren@kernel.org> --- arch/riscv/Kconfig | 8 ++++ arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- arch/riscv/include/asm/hwcap.h | 1 + arch/riscv/kernel/cpu.c | 1 + arch/riscv/kernel/cpufeature.c | 17 ++++++++ arch/riscv/mm/Makefile | 1 + arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ 7 files changed, 125 insertions(+), 1 deletion(-) create mode 100644 arch/riscv/mm/dma-noncoherent.c diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index 5adcbd9b5e88..d3a1cd41c203 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@ -208,6 +208,14 @@ config PGTABLE_LEVELS config LOCKDEP_SUPPORT def_bool y +config RISCV_DMA_NONCOHERENT + bool "Support non-coherent dma operation" + select ARCH_HAS_DMA_PREP_COHERENT + select ARCH_HAS_SYNC_DMA_FOR_DEVICE + select ARCH_HAS_SYNC_DMA_FOR_CPU + select ARCH_HAS_SETUP_DMA_OPS + select DMA_DIRECT_REMAP + source "arch/riscv/Kconfig.socs" source "arch/riscv/Kconfig.erratas" diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h index 4fac46b82c16..7a2dd61af24d 100644 --- a/arch/riscv/include/asm/errata_list.h +++ b/arch/riscv/include/asm/errata_list.h @@ -20,7 +20,8 @@ #endif #define CPUFEATURE_SVPBMT 0 -#define CPUFEATURE_NUMBER 1 +#define CPUFEATURE_CMO 1 +#define CPUFEATURE_NUMBER 2 #ifdef __ASSEMBLY__ @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ #define ALT_THEAD_PMA(_val) #endif +/* + * cbo.clean rs1 + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0...01 rs1 010 00000 0001111 + * + * cbo.flush rs1 + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0...10 rs1 010 00000 0001111 + * + * cbo.inval rs1 + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0...00 rs1 010 00000 0001111 + */ +#define CBO_INVAL_A0 ".long 0x15200F" +#define CBO_CLEAN_A0 ".long 0x25200F" +#define CBO_FLUSH_A0 ".long 0x05200F" + +#define ALT_CMO_OP(_op, _start, _size) \ +asm volatile(ALTERNATIVE( \ + "nop\n\t" \ + "nop\n\t" \ + "nop\n\t" \ + "nop\n\t" \ + "nop", \ + "mv a0, %1\n\t" \ + "j 2f\n\t" \ + "3:\n\t" \ + CBO_##_op##_A0 "\n\t" \ + "addi a0, a0, %0\n\t" \ + "2:\n\t" \ + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) + #endif /* __ASSEMBLY__ */ #endif diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h index 656cd626eb1a..5943d5125a51 100644 --- a/arch/riscv/include/asm/hwcap.h +++ b/arch/riscv/include/asm/hwcap.h @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; */ enum riscv_isa_ext_id { RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, + RISCV_ISA_EXT_ZICBOM, RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, }; diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c index c582d557e555..dfcf592273a7 100644 --- a/arch/riscv/kernel/cpu.c +++ b/arch/riscv/kernel/cpu.c @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) static struct riscv_isa_ext_data isa_ext_arr[] = { __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), }; diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c index 5c5e6e7488ce..0e997fa5524a 100644 --- a/arch/riscv/kernel/cpufeature.c +++ b/arch/riscv/kernel/cpufeature.c @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) set_bit(*ext - 'a', this_isa); } else { SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); } #undef SET_ISA_EXT_MAP } @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) return ret; } +static bool cpufeature_cmo_check_func(unsigned int stage) +{ + switch (stage) { + case RISCV_ALTERNATIVES_EARLY_BOOT: + return false; + default: + return riscv_isa_extension_available(NULL, ZICBOM); + } + + return false; +} + static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { { .name = "svpbmt", .check_func = cpufeature_svpbmt_check_func }, + { + .name = "cmo", + .check_func = cpufeature_cmo_check_func + }, }; static u32 __init_or_module cpufeature_probe(unsigned int stage) diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile index ac7a25298a04..d76aabf4b94d 100644 --- a/arch/riscv/mm/Makefile +++ b/arch/riscv/mm/Makefile @@ -30,3 +30,4 @@ endif endif obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c new file mode 100644 index 000000000000..2c124bcc1932 --- /dev/null +++ b/arch/riscv/mm/dma-noncoherent.c @@ -0,0 +1,61 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * RISC-V specific functions to support DMA for non-coherent devices + * + * Copyright (c) 2021 Western Digital Corporation or its affiliates. + */ + +#include <linux/dma-direct.h> +#include <linux/dma-map-ops.h> +#include <linux/init.h> +#include <linux/io.h> +#include <linux/libfdt.h> +#include <linux/mm.h> +#include <linux/of.h> +#include <linux/of_fdt.h> + +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_FROM_DEVICE: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} + +void arch_dma_prep_coherent(struct page *page, size_t size) +{ + void *flush_addr = page_address(page); + + memset(flush_addr, 0, size); + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); +} + +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, + const struct iommu_ops *iommu, bool coherent) +{ + /* If a specific device is dma-coherent, set it here */ + dev->dma_coherent = coherent; +} -- 2.30.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations 2022-03-07 22:46 ` Heiko Stuebner @ 2022-03-25 16:20 ` Anup Patel -1 siblings, 0 replies; 50+ messages in thread From: Anup Patel @ 2022-03-25 16:20 UTC (permalink / raw) To: Heiko Stuebner Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra, Nick Kossifidis, Samuel Holland, Christoph Muellner, Philipp Tomsich, Christoph Hellwig, Atish Patra On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote: > > The Zicbom ISA-extension was ratified in november 2021 > and introduces instructions for dcache invalidate, clean > and flush operations. > > Implement cache management operations based on them. > > Of course not all cores will support this, so implement an > alternative-based mechanism that replaces empty instructions > with ones done around Zicbom instructions. > > We're using prebuild instructions for the Zicbom instructions > for now, to not require a bleeding-edge compiler (gcc-12) > for these somewhat simple instructions. > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Atish Patra <atish.patra@wdc.com> > Cc: Guo Ren <guoren@kernel.org> > --- > arch/riscv/Kconfig | 8 ++++ > arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- > arch/riscv/include/asm/hwcap.h | 1 + > arch/riscv/kernel/cpu.c | 1 + > arch/riscv/kernel/cpufeature.c | 17 ++++++++ > arch/riscv/mm/Makefile | 1 + > arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ > 7 files changed, 125 insertions(+), 1 deletion(-) > create mode 100644 arch/riscv/mm/dma-noncoherent.c > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 5adcbd9b5e88..d3a1cd41c203 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS > config LOCKDEP_SUPPORT > def_bool y > > +config RISCV_DMA_NONCOHERENT > + bool "Support non-coherent dma operation" > + select ARCH_HAS_DMA_PREP_COHERENT > + select ARCH_HAS_SYNC_DMA_FOR_DEVICE > + select ARCH_HAS_SYNC_DMA_FOR_CPU > + select ARCH_HAS_SETUP_DMA_OPS > + select DMA_DIRECT_REMAP > + > source "arch/riscv/Kconfig.socs" > source "arch/riscv/Kconfig.erratas" > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > index 4fac46b82c16..7a2dd61af24d 100644 > --- a/arch/riscv/include/asm/errata_list.h > +++ b/arch/riscv/include/asm/errata_list.h > @@ -20,7 +20,8 @@ > #endif > > #define CPUFEATURE_SVPBMT 0 > -#define CPUFEATURE_NUMBER 1 > +#define CPUFEATURE_CMO 1 > +#define CPUFEATURE_NUMBER 2 > > #ifdef __ASSEMBLY__ > > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ > #define ALT_THEAD_PMA(_val) > #endif > > +/* > + * cbo.clean rs1 > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0...01 rs1 010 00000 0001111 > + * > + * cbo.flush rs1 > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0...10 rs1 010 00000 0001111 > + * > + * cbo.inval rs1 > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0...00 rs1 010 00000 0001111 > + */ > +#define CBO_INVAL_A0 ".long 0x15200F" > +#define CBO_CLEAN_A0 ".long 0x25200F" > +#define CBO_FLUSH_A0 ".long 0x05200F" > + > +#define ALT_CMO_OP(_op, _start, _size) \ > +asm volatile(ALTERNATIVE( \ > + "nop\n\t" \ > + "nop\n\t" \ > + "nop\n\t" \ > + "nop\n\t" \ > + "nop", \ > + "mv a0, %1\n\t" \ > + "j 2f\n\t" \ > + "3:\n\t" \ > + CBO_##_op##_A0 "\n\t" \ > + "addi a0, a0, %0\n\t" \ > + "2:\n\t" \ > + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) Why not use a global variable (e.g. riscv_cbom_block_size) representing exact cbom block size instead of L1_CACHE_BYTES ? The default value of riscv_cbom_block_size can be L1_CACHE_BYTES which can be overridden at boot-time using optional "riscv,cbom-block-size" DT property. The rationale here is that if underlying RISC-V implementation has cbom block size smaller than L1_CACHE_BYTES then it will result in incomplete cbom range operation. The riscv_cbom_block_size global variable ensures that the right block size is used at least for cbom operations. Regards, Anup > + > #endif /* __ASSEMBLY__ */ > > #endif > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h > index 656cd626eb1a..5943d5125a51 100644 > --- a/arch/riscv/include/asm/hwcap.h > +++ b/arch/riscv/include/asm/hwcap.h > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; > */ > enum riscv_isa_ext_id { > RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, > + RISCV_ISA_EXT_ZICBOM, > RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, > }; > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c > index c582d557e555..dfcf592273a7 100644 > --- a/arch/riscv/kernel/cpu.c > +++ b/arch/riscv/kernel/cpu.c > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) > > static struct riscv_isa_ext_data isa_ext_arr[] = { > __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), > + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will stringify the first parameter. > __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), > }; > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c > index 5c5e6e7488ce..0e997fa5524a 100644 > --- a/arch/riscv/kernel/cpufeature.c > +++ b/arch/riscv/kernel/cpufeature.c > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) > set_bit(*ext - 'a', this_isa); > } else { > SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); > + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); > } > #undef SET_ISA_EXT_MAP > } > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) > return ret; > } > > +static bool cpufeature_cmo_check_func(unsigned int stage) > +{ > + switch (stage) { > + case RISCV_ALTERNATIVES_EARLY_BOOT: > + return false; > + default: > + return riscv_isa_extension_available(NULL, ZICBOM); > + } > + > + return false; > +} > + > static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { > { > .name = "svpbmt", > .check_func = cpufeature_svpbmt_check_func > }, > + { > + .name = "cmo", > + .check_func = cpufeature_cmo_check_func > + }, > }; > > static u32 __init_or_module cpufeature_probe(unsigned int stage) > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > index ac7a25298a04..d76aabf4b94d 100644 > --- a/arch/riscv/mm/Makefile > +++ b/arch/riscv/mm/Makefile > @@ -30,3 +30,4 @@ endif > endif > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > new file mode 100644 > index 000000000000..2c124bcc1932 > --- /dev/null > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -0,0 +1,61 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * RISC-V specific functions to support DMA for non-coherent devices > + * > + * Copyright (c) 2021 Western Digital Corporation or its affiliates. > + */ > + > +#include <linux/dma-direct.h> > +#include <linux/dma-map-ops.h> > +#include <linux/init.h> > +#include <linux/io.h> > +#include <linux/libfdt.h> > +#include <linux/mm.h> > +#include <linux/of.h> > +#include <linux/of_fdt.h> > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > + break; > + case DMA_FROM_DEVICE: > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > + break; > + case DMA_BIDIRECTIONAL: > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > + break; > + default: > + break; > + } > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > + break; > + default: > + break; > + } > +} > + > +void arch_dma_prep_coherent(struct page *page, size_t size) > +{ > + void *flush_addr = page_address(page); > + > + memset(flush_addr, 0, size); > + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); > +} > + > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > + const struct iommu_ops *iommu, bool coherent) > +{ > + /* If a specific device is dma-coherent, set it here */ > + dev->dma_coherent = coherent; > +} > -- > 2.30.2 > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations @ 2022-03-25 16:20 ` Anup Patel 0 siblings, 0 replies; 50+ messages in thread From: Anup Patel @ 2022-03-25 16:20 UTC (permalink / raw) To: Heiko Stuebner Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra, Nick Kossifidis, Samuel Holland, Christoph Muellner, Philipp Tomsich, Christoph Hellwig, Atish Patra On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote: > > The Zicbom ISA-extension was ratified in november 2021 > and introduces instructions for dcache invalidate, clean > and flush operations. > > Implement cache management operations based on them. > > Of course not all cores will support this, so implement an > alternative-based mechanism that replaces empty instructions > with ones done around Zicbom instructions. > > We're using prebuild instructions for the Zicbom instructions > for now, to not require a bleeding-edge compiler (gcc-12) > for these somewhat simple instructions. > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > Cc: Christoph Hellwig <hch@lst.de> > Cc: Atish Patra <atish.patra@wdc.com> > Cc: Guo Ren <guoren@kernel.org> > --- > arch/riscv/Kconfig | 8 ++++ > arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- > arch/riscv/include/asm/hwcap.h | 1 + > arch/riscv/kernel/cpu.c | 1 + > arch/riscv/kernel/cpufeature.c | 17 ++++++++ > arch/riscv/mm/Makefile | 1 + > arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ > 7 files changed, 125 insertions(+), 1 deletion(-) > create mode 100644 arch/riscv/mm/dma-noncoherent.c > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > index 5adcbd9b5e88..d3a1cd41c203 100644 > --- a/arch/riscv/Kconfig > +++ b/arch/riscv/Kconfig > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS > config LOCKDEP_SUPPORT > def_bool y > > +config RISCV_DMA_NONCOHERENT > + bool "Support non-coherent dma operation" > + select ARCH_HAS_DMA_PREP_COHERENT > + select ARCH_HAS_SYNC_DMA_FOR_DEVICE > + select ARCH_HAS_SYNC_DMA_FOR_CPU > + select ARCH_HAS_SETUP_DMA_OPS > + select DMA_DIRECT_REMAP > + > source "arch/riscv/Kconfig.socs" > source "arch/riscv/Kconfig.erratas" > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > index 4fac46b82c16..7a2dd61af24d 100644 > --- a/arch/riscv/include/asm/errata_list.h > +++ b/arch/riscv/include/asm/errata_list.h > @@ -20,7 +20,8 @@ > #endif > > #define CPUFEATURE_SVPBMT 0 > -#define CPUFEATURE_NUMBER 1 > +#define CPUFEATURE_CMO 1 > +#define CPUFEATURE_NUMBER 2 > > #ifdef __ASSEMBLY__ > > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ > #define ALT_THEAD_PMA(_val) > #endif > > +/* > + * cbo.clean rs1 > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0...01 rs1 010 00000 0001111 > + * > + * cbo.flush rs1 > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0...10 rs1 010 00000 0001111 > + * > + * cbo.inval rs1 > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0...00 rs1 010 00000 0001111 > + */ > +#define CBO_INVAL_A0 ".long 0x15200F" > +#define CBO_CLEAN_A0 ".long 0x25200F" > +#define CBO_FLUSH_A0 ".long 0x05200F" > + > +#define ALT_CMO_OP(_op, _start, _size) \ > +asm volatile(ALTERNATIVE( \ > + "nop\n\t" \ > + "nop\n\t" \ > + "nop\n\t" \ > + "nop\n\t" \ > + "nop", \ > + "mv a0, %1\n\t" \ > + "j 2f\n\t" \ > + "3:\n\t" \ > + CBO_##_op##_A0 "\n\t" \ > + "addi a0, a0, %0\n\t" \ > + "2:\n\t" \ > + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) Why not use a global variable (e.g. riscv_cbom_block_size) representing exact cbom block size instead of L1_CACHE_BYTES ? The default value of riscv_cbom_block_size can be L1_CACHE_BYTES which can be overridden at boot-time using optional "riscv,cbom-block-size" DT property. The rationale here is that if underlying RISC-V implementation has cbom block size smaller than L1_CACHE_BYTES then it will result in incomplete cbom range operation. The riscv_cbom_block_size global variable ensures that the right block size is used at least for cbom operations. Regards, Anup > + > #endif /* __ASSEMBLY__ */ > > #endif > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h > index 656cd626eb1a..5943d5125a51 100644 > --- a/arch/riscv/include/asm/hwcap.h > +++ b/arch/riscv/include/asm/hwcap.h > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; > */ > enum riscv_isa_ext_id { > RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, > + RISCV_ISA_EXT_ZICBOM, > RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, > }; > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c > index c582d557e555..dfcf592273a7 100644 > --- a/arch/riscv/kernel/cpu.c > +++ b/arch/riscv/kernel/cpu.c > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) > > static struct riscv_isa_ext_data isa_ext_arr[] = { > __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), > + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will stringify the first parameter. > __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), > }; > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c > index 5c5e6e7488ce..0e997fa5524a 100644 > --- a/arch/riscv/kernel/cpufeature.c > +++ b/arch/riscv/kernel/cpufeature.c > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) > set_bit(*ext - 'a', this_isa); > } else { > SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); > + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); > } > #undef SET_ISA_EXT_MAP > } > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) > return ret; > } > > +static bool cpufeature_cmo_check_func(unsigned int stage) > +{ > + switch (stage) { > + case RISCV_ALTERNATIVES_EARLY_BOOT: > + return false; > + default: > + return riscv_isa_extension_available(NULL, ZICBOM); > + } > + > + return false; > +} > + > static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { > { > .name = "svpbmt", > .check_func = cpufeature_svpbmt_check_func > }, > + { > + .name = "cmo", > + .check_func = cpufeature_cmo_check_func > + }, > }; > > static u32 __init_or_module cpufeature_probe(unsigned int stage) > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > index ac7a25298a04..d76aabf4b94d 100644 > --- a/arch/riscv/mm/Makefile > +++ b/arch/riscv/mm/Makefile > @@ -30,3 +30,4 @@ endif > endif > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > new file mode 100644 > index 000000000000..2c124bcc1932 > --- /dev/null > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -0,0 +1,61 @@ > +// SPDX-License-Identifier: GPL-2.0-only > +/* > + * RISC-V specific functions to support DMA for non-coherent devices > + * > + * Copyright (c) 2021 Western Digital Corporation or its affiliates. > + */ > + > +#include <linux/dma-direct.h> > +#include <linux/dma-map-ops.h> > +#include <linux/init.h> > +#include <linux/io.h> > +#include <linux/libfdt.h> > +#include <linux/mm.h> > +#include <linux/of.h> > +#include <linux/of_fdt.h> > + > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > + break; > + case DMA_FROM_DEVICE: > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > + break; > + case DMA_BIDIRECTIONAL: > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > + break; > + default: > + break; > + } > +} > + > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > +{ > + switch (dir) { > + case DMA_TO_DEVICE: > + break; > + case DMA_FROM_DEVICE: > + case DMA_BIDIRECTIONAL: > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > + break; > + default: > + break; > + } > +} > + > +void arch_dma_prep_coherent(struct page *page, size_t size) > +{ > + void *flush_addr = page_address(page); > + > + memset(flush_addr, 0, size); > + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); > +} > + > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > + const struct iommu_ops *iommu, bool coherent) > +{ > + /* If a specific device is dma-coherent, set it here */ > + dev->dma_coherent = coherent; > +} > -- > 2.30.2 > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations 2022-03-25 16:20 ` Anup Patel @ 2022-03-25 17:24 ` Philipp Tomsich -1 siblings, 0 replies; 50+ messages in thread From: Philipp Tomsich @ 2022-03-25 17:24 UTC (permalink / raw) To: Anup Patel Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra, Nick Kossifidis, Samuel Holland, Christoph Muellner, Christoph Hellwig, Atish Patra On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote: > > On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote: > > > > The Zicbom ISA-extension was ratified in november 2021 > > and introduces instructions for dcache invalidate, clean > > and flush operations. > > > > Implement cache management operations based on them. > > > > Of course not all cores will support this, so implement an > > alternative-based mechanism that replaces empty instructions > > with ones done around Zicbom instructions. > > > > We're using prebuild instructions for the Zicbom instructions > > for now, to not require a bleeding-edge compiler (gcc-12) > > for these somewhat simple instructions. > > > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > > Cc: Christoph Hellwig <hch@lst.de> > > Cc: Atish Patra <atish.patra@wdc.com> > > Cc: Guo Ren <guoren@kernel.org> > > --- > > arch/riscv/Kconfig | 8 ++++ > > arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- > > arch/riscv/include/asm/hwcap.h | 1 + > > arch/riscv/kernel/cpu.c | 1 + > > arch/riscv/kernel/cpufeature.c | 17 ++++++++ > > arch/riscv/mm/Makefile | 1 + > > arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ > > 7 files changed, 125 insertions(+), 1 deletion(-) > > create mode 100644 arch/riscv/mm/dma-noncoherent.c > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 5adcbd9b5e88..d3a1cd41c203 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS > > config LOCKDEP_SUPPORT > > def_bool y > > > > +config RISCV_DMA_NONCOHERENT > > + bool "Support non-coherent dma operation" > > + select ARCH_HAS_DMA_PREP_COHERENT > > + select ARCH_HAS_SYNC_DMA_FOR_DEVICE > > + select ARCH_HAS_SYNC_DMA_FOR_CPU > > + select ARCH_HAS_SETUP_DMA_OPS > > + select DMA_DIRECT_REMAP > > + > > source "arch/riscv/Kconfig.socs" > > source "arch/riscv/Kconfig.erratas" > > > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > > index 4fac46b82c16..7a2dd61af24d 100644 > > --- a/arch/riscv/include/asm/errata_list.h > > +++ b/arch/riscv/include/asm/errata_list.h > > @@ -20,7 +20,8 @@ > > #endif > > > > #define CPUFEATURE_SVPBMT 0 > > -#define CPUFEATURE_NUMBER 1 > > +#define CPUFEATURE_CMO 1 > > +#define CPUFEATURE_NUMBER 2 > > > > #ifdef __ASSEMBLY__ > > > > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ > > #define ALT_THEAD_PMA(_val) > > #endif > > > > +/* > > + * cbo.clean rs1 > > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0...01 rs1 010 00000 0001111 > > + * > > + * cbo.flush rs1 > > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0...10 rs1 010 00000 0001111 > > + * > > + * cbo.inval rs1 > > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0...00 rs1 010 00000 0001111 > > + */ > > +#define CBO_INVAL_A0 ".long 0x15200F" > > +#define CBO_CLEAN_A0 ".long 0x25200F" > > +#define CBO_FLUSH_A0 ".long 0x05200F" > > + > > +#define ALT_CMO_OP(_op, _start, _size) \ > > +asm volatile(ALTERNATIVE( \ > > + "nop\n\t" \ > > + "nop\n\t" \ > > + "nop\n\t" \ > > + "nop\n\t" \ > > + "nop", \ > > + "mv a0, %1\n\t" \ > > + "j 2f\n\t" \ > > + "3:\n\t" \ > > + CBO_##_op##_A0 "\n\t" \ > > + "addi a0, a0, %0\n\t" \ > > + "2:\n\t" \ > > + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > > + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > > + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > > Why not use a global variable (e.g. riscv_cbom_block_size) representing > exact cbom block size instead of L1_CACHE_BYTES ? Didn't the discussions around platforms gravitate towards a fixed cache-block operations size (note that this is orthogonal from the cache-line size) a requirement for Linux-capable platforms? Philipp. > The default value of riscv_cbom_block_size can be L1_CACHE_BYTES > which can be overridden at boot-time using optional "riscv,cbom-block-size" > DT property. > > The rationale here is that if underlying RISC-V implementation has cbom > block size smaller than L1_CACHE_BYTES then it will result in incomplete > cbom range operation. The riscv_cbom_block_size global variable ensures > that the right block size is used at least for cbom operations. > > Regards, > Anup > > > + > > #endif /* __ASSEMBLY__ */ > > > > #endif > > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h > > index 656cd626eb1a..5943d5125a51 100644 > > --- a/arch/riscv/include/asm/hwcap.h > > +++ b/arch/riscv/include/asm/hwcap.h > > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; > > */ > > enum riscv_isa_ext_id { > > RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, > > + RISCV_ISA_EXT_ZICBOM, > > RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, > > }; > > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c > > index c582d557e555..dfcf592273a7 100644 > > --- a/arch/riscv/kernel/cpu.c > > +++ b/arch/riscv/kernel/cpu.c > > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) > > > > static struct riscv_isa_ext_data isa_ext_arr[] = { > > __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), > > + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), > > Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will > stringify the first parameter. > > > __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), > > }; > > > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c > > index 5c5e6e7488ce..0e997fa5524a 100644 > > --- a/arch/riscv/kernel/cpufeature.c > > +++ b/arch/riscv/kernel/cpufeature.c > > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) > > set_bit(*ext - 'a', this_isa); > > } else { > > SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); > > + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); > > } > > #undef SET_ISA_EXT_MAP > > } > > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) > > return ret; > > } > > > > +static bool cpufeature_cmo_check_func(unsigned int stage) > > +{ > > + switch (stage) { > > + case RISCV_ALTERNATIVES_EARLY_BOOT: > > + return false; > > + default: > > + return riscv_isa_extension_available(NULL, ZICBOM); > > + } > > + > > + return false; > > +} > > + > > static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { > > { > > .name = "svpbmt", > > .check_func = cpufeature_svpbmt_check_func > > }, > > + { > > + .name = "cmo", > > + .check_func = cpufeature_cmo_check_func > > + }, > > }; > > > > static u32 __init_or_module cpufeature_probe(unsigned int stage) > > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > > index ac7a25298a04..d76aabf4b94d 100644 > > --- a/arch/riscv/mm/Makefile > > +++ b/arch/riscv/mm/Makefile > > @@ -30,3 +30,4 @@ endif > > endif > > > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o > > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > new file mode 100644 > > index 000000000000..2c124bcc1932 > > --- /dev/null > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -0,0 +1,61 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * RISC-V specific functions to support DMA for non-coherent devices > > + * > > + * Copyright (c) 2021 Western Digital Corporation or its affiliates. > > + */ > > + > > +#include <linux/dma-direct.h> > > +#include <linux/dma-map-ops.h> > > +#include <linux/init.h> > > +#include <linux/io.h> > > +#include <linux/libfdt.h> > > +#include <linux/mm.h> > > +#include <linux/of.h> > > +#include <linux/of_fdt.h> > > + > > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > > +{ > > + switch (dir) { > > + case DMA_TO_DEVICE: > > + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + case DMA_FROM_DEVICE: > > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + case DMA_BIDIRECTIONAL: > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + default: > > + break; > > + } > > +} > > + > > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > > +{ > > + switch (dir) { > > + case DMA_TO_DEVICE: > > + break; > > + case DMA_FROM_DEVICE: > > + case DMA_BIDIRECTIONAL: > > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + default: > > + break; > > + } > > +} > > + > > +void arch_dma_prep_coherent(struct page *page, size_t size) > > +{ > > + void *flush_addr = page_address(page); > > + > > + memset(flush_addr, 0, size); > > + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); > > +} > > + > > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > > + const struct iommu_ops *iommu, bool coherent) > > +{ > > + /* If a specific device is dma-coherent, set it here */ > > + dev->dma_coherent = coherent; > > +} > > -- > > 2.30.2 > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations @ 2022-03-25 17:24 ` Philipp Tomsich 0 siblings, 0 replies; 50+ messages in thread From: Philipp Tomsich @ 2022-03-25 17:24 UTC (permalink / raw) To: Anup Patel Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra, Nick Kossifidis, Samuel Holland, Christoph Muellner, Christoph Hellwig, Atish Patra On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote: > > On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote: > > > > The Zicbom ISA-extension was ratified in november 2021 > > and introduces instructions for dcache invalidate, clean > > and flush operations. > > > > Implement cache management operations based on them. > > > > Of course not all cores will support this, so implement an > > alternative-based mechanism that replaces empty instructions > > with ones done around Zicbom instructions. > > > > We're using prebuild instructions for the Zicbom instructions > > for now, to not require a bleeding-edge compiler (gcc-12) > > for these somewhat simple instructions. > > > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > > Cc: Christoph Hellwig <hch@lst.de> > > Cc: Atish Patra <atish.patra@wdc.com> > > Cc: Guo Ren <guoren@kernel.org> > > --- > > arch/riscv/Kconfig | 8 ++++ > > arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- > > arch/riscv/include/asm/hwcap.h | 1 + > > arch/riscv/kernel/cpu.c | 1 + > > arch/riscv/kernel/cpufeature.c | 17 ++++++++ > > arch/riscv/mm/Makefile | 1 + > > arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ > > 7 files changed, 125 insertions(+), 1 deletion(-) > > create mode 100644 arch/riscv/mm/dma-noncoherent.c > > > > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig > > index 5adcbd9b5e88..d3a1cd41c203 100644 > > --- a/arch/riscv/Kconfig > > +++ b/arch/riscv/Kconfig > > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS > > config LOCKDEP_SUPPORT > > def_bool y > > > > +config RISCV_DMA_NONCOHERENT > > + bool "Support non-coherent dma operation" > > + select ARCH_HAS_DMA_PREP_COHERENT > > + select ARCH_HAS_SYNC_DMA_FOR_DEVICE > > + select ARCH_HAS_SYNC_DMA_FOR_CPU > > + select ARCH_HAS_SETUP_DMA_OPS > > + select DMA_DIRECT_REMAP > > + > > source "arch/riscv/Kconfig.socs" > > source "arch/riscv/Kconfig.erratas" > > > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > > index 4fac46b82c16..7a2dd61af24d 100644 > > --- a/arch/riscv/include/asm/errata_list.h > > +++ b/arch/riscv/include/asm/errata_list.h > > @@ -20,7 +20,8 @@ > > #endif > > > > #define CPUFEATURE_SVPBMT 0 > > -#define CPUFEATURE_NUMBER 1 > > +#define CPUFEATURE_CMO 1 > > +#define CPUFEATURE_NUMBER 2 > > > > #ifdef __ASSEMBLY__ > > > > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ > > #define ALT_THEAD_PMA(_val) > > #endif > > > > +/* > > + * cbo.clean rs1 > > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0...01 rs1 010 00000 0001111 > > + * > > + * cbo.flush rs1 > > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0...10 rs1 010 00000 0001111 > > + * > > + * cbo.inval rs1 > > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0...00 rs1 010 00000 0001111 > > + */ > > +#define CBO_INVAL_A0 ".long 0x15200F" > > +#define CBO_CLEAN_A0 ".long 0x25200F" > > +#define CBO_FLUSH_A0 ".long 0x05200F" > > + > > +#define ALT_CMO_OP(_op, _start, _size) \ > > +asm volatile(ALTERNATIVE( \ > > + "nop\n\t" \ > > + "nop\n\t" \ > > + "nop\n\t" \ > > + "nop\n\t" \ > > + "nop", \ > > + "mv a0, %1\n\t" \ > > + "j 2f\n\t" \ > > + "3:\n\t" \ > > + CBO_##_op##_A0 "\n\t" \ > > + "addi a0, a0, %0\n\t" \ > > + "2:\n\t" \ > > + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > > + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > > + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > > Why not use a global variable (e.g. riscv_cbom_block_size) representing > exact cbom block size instead of L1_CACHE_BYTES ? Didn't the discussions around platforms gravitate towards a fixed cache-block operations size (note that this is orthogonal from the cache-line size) a requirement for Linux-capable platforms? Philipp. > The default value of riscv_cbom_block_size can be L1_CACHE_BYTES > which can be overridden at boot-time using optional "riscv,cbom-block-size" > DT property. > > The rationale here is that if underlying RISC-V implementation has cbom > block size smaller than L1_CACHE_BYTES then it will result in incomplete > cbom range operation. The riscv_cbom_block_size global variable ensures > that the right block size is used at least for cbom operations. > > Regards, > Anup > > > + > > #endif /* __ASSEMBLY__ */ > > > > #endif > > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h > > index 656cd626eb1a..5943d5125a51 100644 > > --- a/arch/riscv/include/asm/hwcap.h > > +++ b/arch/riscv/include/asm/hwcap.h > > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; > > */ > > enum riscv_isa_ext_id { > > RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, > > + RISCV_ISA_EXT_ZICBOM, > > RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, > > }; > > > > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c > > index c582d557e555..dfcf592273a7 100644 > > --- a/arch/riscv/kernel/cpu.c > > +++ b/arch/riscv/kernel/cpu.c > > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) > > > > static struct riscv_isa_ext_data isa_ext_arr[] = { > > __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), > > + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), > > Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will > stringify the first parameter. > > > __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), > > }; > > > > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c > > index 5c5e6e7488ce..0e997fa5524a 100644 > > --- a/arch/riscv/kernel/cpufeature.c > > +++ b/arch/riscv/kernel/cpufeature.c > > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) > > set_bit(*ext - 'a', this_isa); > > } else { > > SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); > > + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); > > } > > #undef SET_ISA_EXT_MAP > > } > > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) > > return ret; > > } > > > > +static bool cpufeature_cmo_check_func(unsigned int stage) > > +{ > > + switch (stage) { > > + case RISCV_ALTERNATIVES_EARLY_BOOT: > > + return false; > > + default: > > + return riscv_isa_extension_available(NULL, ZICBOM); > > + } > > + > > + return false; > > +} > > + > > static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { > > { > > .name = "svpbmt", > > .check_func = cpufeature_svpbmt_check_func > > }, > > + { > > + .name = "cmo", > > + .check_func = cpufeature_cmo_check_func > > + }, > > }; > > > > static u32 __init_or_module cpufeature_probe(unsigned int stage) > > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile > > index ac7a25298a04..d76aabf4b94d 100644 > > --- a/arch/riscv/mm/Makefile > > +++ b/arch/riscv/mm/Makefile > > @@ -30,3 +30,4 @@ endif > > endif > > > > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o > > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > new file mode 100644 > > index 000000000000..2c124bcc1932 > > --- /dev/null > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -0,0 +1,61 @@ > > +// SPDX-License-Identifier: GPL-2.0-only > > +/* > > + * RISC-V specific functions to support DMA for non-coherent devices > > + * > > + * Copyright (c) 2021 Western Digital Corporation or its affiliates. > > + */ > > + > > +#include <linux/dma-direct.h> > > +#include <linux/dma-map-ops.h> > > +#include <linux/init.h> > > +#include <linux/io.h> > > +#include <linux/libfdt.h> > > +#include <linux/mm.h> > > +#include <linux/of.h> > > +#include <linux/of_fdt.h> > > + > > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > > +{ > > + switch (dir) { > > + case DMA_TO_DEVICE: > > + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + case DMA_FROM_DEVICE: > > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + case DMA_BIDIRECTIONAL: > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + default: > > + break; > > + } > > +} > > + > > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) > > +{ > > + switch (dir) { > > + case DMA_TO_DEVICE: > > + break; > > + case DMA_FROM_DEVICE: > > + case DMA_BIDIRECTIONAL: > > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > + break; > > + default: > > + break; > > + } > > +} > > + > > +void arch_dma_prep_coherent(struct page *page, size_t size) > > +{ > > + void *flush_addr = page_address(page); > > + > > + memset(flush_addr, 0, size); > > + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); > > +} > > + > > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, > > + const struct iommu_ops *iommu, bool coherent) > > +{ > > + /* If a specific device is dma-coherent, set it here */ > > + dev->dma_coherent = coherent; > > +} > > -- > > 2.30.2 > > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
[parent not found: <CAAeLtUAi+61Hk7oBW979QEKYaume3vqdt_KkS_mXpRAs+CzHnA@mail.gmail.com>]
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations [not found] ` <CAAeLtUAi+61Hk7oBW979QEKYaume3vqdt_KkS_mXpRAs+CzHnA@mail.gmail.com> @ 2022-03-25 17:37 ` Anup Patel 0 siblings, 0 replies; 50+ messages in thread From: Anup Patel @ 2022-03-25 17:37 UTC (permalink / raw) To: Philipp Tomsich Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra, Nick Kossifidis, Samuel Holland, Christoph Muellner, Christoph Hellwig, Atish Patra On Fri, Mar 25, 2022 at 10:52 PM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > Anup, > > On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote: >> >> On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote: >> > >> > The Zicbom ISA-extension was ratified in november 2021 >> > and introduces instructions for dcache invalidate, clean >> > and flush operations. >> > >> > Implement cache management operations based on them. >> > >> > Of course not all cores will support this, so implement an >> > alternative-based mechanism that replaces empty instructions >> > with ones done around Zicbom instructions. >> > >> > We're using prebuild instructions for the Zicbom instructions >> > for now, to not require a bleeding-edge compiler (gcc-12) >> > for these somewhat simple instructions. >> > >> > Signed-off-by: Heiko Stuebner <heiko@sntech.de> >> > Cc: Christoph Hellwig <hch@lst.de> >> > Cc: Atish Patra <atish.patra@wdc.com> >> > Cc: Guo Ren <guoren@kernel.org> >> > --- >> > arch/riscv/Kconfig | 8 ++++ >> > arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- >> > arch/riscv/include/asm/hwcap.h | 1 + >> > arch/riscv/kernel/cpu.c | 1 + >> > arch/riscv/kernel/cpufeature.c | 17 ++++++++ >> > arch/riscv/mm/Makefile | 1 + >> > arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ >> > 7 files changed, 125 insertions(+), 1 deletion(-) >> > create mode 100644 arch/riscv/mm/dma-noncoherent.c >> > >> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> > index 5adcbd9b5e88..d3a1cd41c203 100644 >> > --- a/arch/riscv/Kconfig >> > +++ b/arch/riscv/Kconfig >> > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS >> > config LOCKDEP_SUPPORT >> > def_bool y >> > >> > +config RISCV_DMA_NONCOHERENT >> > + bool "Support non-coherent dma operation" >> > + select ARCH_HAS_DMA_PREP_COHERENT >> > + select ARCH_HAS_SYNC_DMA_FOR_DEVICE >> > + select ARCH_HAS_SYNC_DMA_FOR_CPU >> > + select ARCH_HAS_SETUP_DMA_OPS >> > + select DMA_DIRECT_REMAP >> > + >> > source "arch/riscv/Kconfig.socs" >> > source "arch/riscv/Kconfig.erratas" >> > >> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h >> > index 4fac46b82c16..7a2dd61af24d 100644 >> > --- a/arch/riscv/include/asm/errata_list.h >> > +++ b/arch/riscv/include/asm/errata_list.h >> > @@ -20,7 +20,8 @@ >> > #endif >> > >> > #define CPUFEATURE_SVPBMT 0 >> > -#define CPUFEATURE_NUMBER 1 >> > +#define CPUFEATURE_CMO 1 >> > +#define CPUFEATURE_NUMBER 2 >> > >> > #ifdef __ASSEMBLY__ >> > >> > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ >> > #define ALT_THEAD_PMA(_val) >> > #endif >> > >> > +/* >> > + * cbo.clean rs1 >> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0...01 rs1 010 00000 0001111 >> > + * >> > + * cbo.flush rs1 >> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0...10 rs1 010 00000 0001111 >> > + * >> > + * cbo.inval rs1 >> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0...00 rs1 010 00000 0001111 >> > + */ >> > +#define CBO_INVAL_A0 ".long 0x15200F" >> > +#define CBO_CLEAN_A0 ".long 0x25200F" >> > +#define CBO_FLUSH_A0 ".long 0x05200F" >> > + >> > +#define ALT_CMO_OP(_op, _start, _size) \ >> > +asm volatile(ALTERNATIVE( \ >> > + "nop\n\t" \ >> > + "nop\n\t" \ >> > + "nop\n\t" \ >> > + "nop\n\t" \ >> > + "nop", \ >> > + "mv a0, %1\n\t" \ >> > + "j 2f\n\t" \ >> > + "3:\n\t" \ >> > + CBO_##_op##_A0 "\n\t" \ >> > + "addi a0, a0, %0\n\t" \ >> > + "2:\n\t" \ >> > + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ >> > + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ >> > + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) >> >> Why not use a global variable (e.g. riscv_cbom_block_size) representing >> exact cbom block size instead of L1_CACHE_BYTES ? > > > Didn't the discussions around platforms gravitate towards a fixed cache-block > operations size (note that this is orthogonal from the cache-line size) a > requirement for Linux-capable platforms? I recall past platform discussions. Implementations compliant with platforms spec (whenever that is ratified) will converge to a fixed cache-block side but this does not mean Linux CMO support should not work for implementations with a different cache-block size. For e.g. the ARM64 cache operations (arch/arm64/mm/cache.S) use determine cache line size using ctr_el0 MSR. Regards, Anup > > Philipp. > >> The default value of riscv_cbom_block_size can be L1_CACHE_BYTES >> which can be overridden at boot-time using optional "riscv,cbom-block-size" >> DT property. >> >> The rationale here is that if underlying RISC-V implementation has cbom >> block size smaller than L1_CACHE_BYTES then it will result in incomplete >> cbom range operation. The riscv_cbom_block_size global variable ensures >> that the right block size is used at least for cbom operations. >> >> Regards, >> Anup >> >> > + >> > #endif /* __ASSEMBLY__ */ >> > >> > #endif >> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h >> > index 656cd626eb1a..5943d5125a51 100644 >> > --- a/arch/riscv/include/asm/hwcap.h >> > +++ b/arch/riscv/include/asm/hwcap.h >> > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; >> > */ >> > enum riscv_isa_ext_id { >> > RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, >> > + RISCV_ISA_EXT_ZICBOM, >> > RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, >> > }; >> > >> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c >> > index c582d557e555..dfcf592273a7 100644 >> > --- a/arch/riscv/kernel/cpu.c >> > +++ b/arch/riscv/kernel/cpu.c >> > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) >> > >> > static struct riscv_isa_ext_data isa_ext_arr[] = { >> > __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), >> > + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), >> >> Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will >> stringify the first parameter. >> >> > __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), >> > }; >> > >> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c >> > index 5c5e6e7488ce..0e997fa5524a 100644 >> > --- a/arch/riscv/kernel/cpufeature.c >> > +++ b/arch/riscv/kernel/cpufeature.c >> > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) >> > set_bit(*ext - 'a', this_isa); >> > } else { >> > SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); >> > + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); >> > } >> > #undef SET_ISA_EXT_MAP >> > } >> > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) >> > return ret; >> > } >> > >> > +static bool cpufeature_cmo_check_func(unsigned int stage) >> > +{ >> > + switch (stage) { >> > + case RISCV_ALTERNATIVES_EARLY_BOOT: >> > + return false; >> > + default: >> > + return riscv_isa_extension_available(NULL, ZICBOM); >> > + } >> > + >> > + return false; >> > +} >> > + >> > static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { >> > { >> > .name = "svpbmt", >> > .check_func = cpufeature_svpbmt_check_func >> > }, >> > + { >> > + .name = "cmo", >> > + .check_func = cpufeature_cmo_check_func >> > + }, >> > }; >> > >> > static u32 __init_or_module cpufeature_probe(unsigned int stage) >> > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile >> > index ac7a25298a04..d76aabf4b94d 100644 >> > --- a/arch/riscv/mm/Makefile >> > +++ b/arch/riscv/mm/Makefile >> > @@ -30,3 +30,4 @@ endif >> > endif >> > >> > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o >> > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o >> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c >> > new file mode 100644 >> > index 000000000000..2c124bcc1932 >> > --- /dev/null >> > +++ b/arch/riscv/mm/dma-noncoherent.c >> > @@ -0,0 +1,61 @@ >> > +// SPDX-License-Identifier: GPL-2.0-only >> > +/* >> > + * RISC-V specific functions to support DMA for non-coherent devices >> > + * >> > + * Copyright (c) 2021 Western Digital Corporation or its affiliates. >> > + */ >> > + >> > +#include <linux/dma-direct.h> >> > +#include <linux/dma-map-ops.h> >> > +#include <linux/init.h> >> > +#include <linux/io.h> >> > +#include <linux/libfdt.h> >> > +#include <linux/mm.h> >> > +#include <linux/of.h> >> > +#include <linux/of_fdt.h> >> > + >> > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) >> > +{ >> > + switch (dir) { >> > + case DMA_TO_DEVICE: >> > + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + case DMA_FROM_DEVICE: >> > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + case DMA_BIDIRECTIONAL: >> > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + default: >> > + break; >> > + } >> > +} >> > + >> > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) >> > +{ >> > + switch (dir) { >> > + case DMA_TO_DEVICE: >> > + break; >> > + case DMA_FROM_DEVICE: >> > + case DMA_BIDIRECTIONAL: >> > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + default: >> > + break; >> > + } >> > +} >> > + >> > +void arch_dma_prep_coherent(struct page *page, size_t size) >> > +{ >> > + void *flush_addr = page_address(page); >> > + >> > + memset(flush_addr, 0, size); >> > + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); >> > +} >> > + >> > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, >> > + const struct iommu_ops *iommu, bool coherent) >> > +{ >> > + /* If a specific device is dma-coherent, set it here */ >> > + dev->dma_coherent = coherent; >> > +} >> > -- >> > 2.30.2 >> > >> >> _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations @ 2022-03-25 17:37 ` Anup Patel 0 siblings, 0 replies; 50+ messages in thread From: Anup Patel @ 2022-03-25 17:37 UTC (permalink / raw) To: Philipp Tomsich Cc: Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel@vger.kernel.org List, Wei Fu, Guo Ren, Atish Patra, Nick Kossifidis, Samuel Holland, Christoph Muellner, Christoph Hellwig, Atish Patra On Fri, Mar 25, 2022 at 10:52 PM Philipp Tomsich <philipp.tomsich@vrull.eu> wrote: > > Anup, > > On Fri, 25 Mar 2022 at 17:20, Anup Patel <anup@brainfault.org> wrote: >> >> On Tue, Mar 8, 2022 at 4:16 AM Heiko Stuebner <heiko@sntech.de> wrote: >> > >> > The Zicbom ISA-extension was ratified in november 2021 >> > and introduces instructions for dcache invalidate, clean >> > and flush operations. >> > >> > Implement cache management operations based on them. >> > >> > Of course not all cores will support this, so implement an >> > alternative-based mechanism that replaces empty instructions >> > with ones done around Zicbom instructions. >> > >> > We're using prebuild instructions for the Zicbom instructions >> > for now, to not require a bleeding-edge compiler (gcc-12) >> > for these somewhat simple instructions. >> > >> > Signed-off-by: Heiko Stuebner <heiko@sntech.de> >> > Cc: Christoph Hellwig <hch@lst.de> >> > Cc: Atish Patra <atish.patra@wdc.com> >> > Cc: Guo Ren <guoren@kernel.org> >> > --- >> > arch/riscv/Kconfig | 8 ++++ >> > arch/riscv/include/asm/errata_list.h | 37 ++++++++++++++++- >> > arch/riscv/include/asm/hwcap.h | 1 + >> > arch/riscv/kernel/cpu.c | 1 + >> > arch/riscv/kernel/cpufeature.c | 17 ++++++++ >> > arch/riscv/mm/Makefile | 1 + >> > arch/riscv/mm/dma-noncoherent.c | 61 ++++++++++++++++++++++++++++ >> > 7 files changed, 125 insertions(+), 1 deletion(-) >> > create mode 100644 arch/riscv/mm/dma-noncoherent.c >> > >> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> > index 5adcbd9b5e88..d3a1cd41c203 100644 >> > --- a/arch/riscv/Kconfig >> > +++ b/arch/riscv/Kconfig >> > @@ -208,6 +208,14 @@ config PGTABLE_LEVELS >> > config LOCKDEP_SUPPORT >> > def_bool y >> > >> > +config RISCV_DMA_NONCOHERENT >> > + bool "Support non-coherent dma operation" >> > + select ARCH_HAS_DMA_PREP_COHERENT >> > + select ARCH_HAS_SYNC_DMA_FOR_DEVICE >> > + select ARCH_HAS_SYNC_DMA_FOR_CPU >> > + select ARCH_HAS_SETUP_DMA_OPS >> > + select DMA_DIRECT_REMAP >> > + >> > source "arch/riscv/Kconfig.socs" >> > source "arch/riscv/Kconfig.erratas" >> > >> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h >> > index 4fac46b82c16..7a2dd61af24d 100644 >> > --- a/arch/riscv/include/asm/errata_list.h >> > +++ b/arch/riscv/include/asm/errata_list.h >> > @@ -20,7 +20,8 @@ >> > #endif >> > >> > #define CPUFEATURE_SVPBMT 0 >> > -#define CPUFEATURE_NUMBER 1 >> > +#define CPUFEATURE_CMO 1 >> > +#define CPUFEATURE_NUMBER 2 >> > >> > #ifdef __ASSEMBLY__ >> > >> > @@ -86,6 +87,40 @@ asm volatile(ALTERNATIVE( \ >> > #define ALT_THEAD_PMA(_val) >> > #endif >> > >> > +/* >> > + * cbo.clean rs1 >> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0...01 rs1 010 00000 0001111 >> > + * >> > + * cbo.flush rs1 >> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0...10 rs1 010 00000 0001111 >> > + * >> > + * cbo.inval rs1 >> > + * | 31 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0...00 rs1 010 00000 0001111 >> > + */ >> > +#define CBO_INVAL_A0 ".long 0x15200F" >> > +#define CBO_CLEAN_A0 ".long 0x25200F" >> > +#define CBO_FLUSH_A0 ".long 0x05200F" >> > + >> > +#define ALT_CMO_OP(_op, _start, _size) \ >> > +asm volatile(ALTERNATIVE( \ >> > + "nop\n\t" \ >> > + "nop\n\t" \ >> > + "nop\n\t" \ >> > + "nop\n\t" \ >> > + "nop", \ >> > + "mv a0, %1\n\t" \ >> > + "j 2f\n\t" \ >> > + "3:\n\t" \ >> > + CBO_##_op##_A0 "\n\t" \ >> > + "addi a0, a0, %0\n\t" \ >> > + "2:\n\t" \ >> > + "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ >> > + : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ >> > + "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) >> >> Why not use a global variable (e.g. riscv_cbom_block_size) representing >> exact cbom block size instead of L1_CACHE_BYTES ? > > > Didn't the discussions around platforms gravitate towards a fixed cache-block > operations size (note that this is orthogonal from the cache-line size) a > requirement for Linux-capable platforms? I recall past platform discussions. Implementations compliant with platforms spec (whenever that is ratified) will converge to a fixed cache-block side but this does not mean Linux CMO support should not work for implementations with a different cache-block size. For e.g. the ARM64 cache operations (arch/arm64/mm/cache.S) use determine cache line size using ctr_el0 MSR. Regards, Anup > > Philipp. > >> The default value of riscv_cbom_block_size can be L1_CACHE_BYTES >> which can be overridden at boot-time using optional "riscv,cbom-block-size" >> DT property. >> >> The rationale here is that if underlying RISC-V implementation has cbom >> block size smaller than L1_CACHE_BYTES then it will result in incomplete >> cbom range operation. The riscv_cbom_block_size global variable ensures >> that the right block size is used at least for cbom operations. >> >> Regards, >> Anup >> >> > + >> > #endif /* __ASSEMBLY__ */ >> > >> > #endif >> > diff --git a/arch/riscv/include/asm/hwcap.h b/arch/riscv/include/asm/hwcap.h >> > index 656cd626eb1a..5943d5125a51 100644 >> > --- a/arch/riscv/include/asm/hwcap.h >> > +++ b/arch/riscv/include/asm/hwcap.h >> > @@ -52,6 +52,7 @@ extern unsigned long elf_hwcap; >> > */ >> > enum riscv_isa_ext_id { >> > RISCV_ISA_EXT_SVPBMT = RISCV_ISA_EXT_BASE, >> > + RISCV_ISA_EXT_ZICBOM, >> > RISCV_ISA_EXT_ID_MAX = RISCV_ISA_EXT_MAX, >> > }; >> > >> > diff --git a/arch/riscv/kernel/cpu.c b/arch/riscv/kernel/cpu.c >> > index c582d557e555..dfcf592273a7 100644 >> > --- a/arch/riscv/kernel/cpu.c >> > +++ b/arch/riscv/kernel/cpu.c >> > @@ -72,6 +72,7 @@ int riscv_of_parent_hartid(struct device_node *node) >> > >> > static struct riscv_isa_ext_data isa_ext_arr[] = { >> > __RISCV_ISA_EXT_DATA("svpbmt", RISCV_ISA_EXT_SVPBMT), >> > + __RISCV_ISA_EXT_DATA("zicbom", RISCV_ISA_EXT_ZICBOM), >> >> Drop the quotes around zicbom because __RISCV_ISA_EXT_DATA() will >> stringify the first parameter. >> >> > __RISCV_ISA_EXT_DATA("", RISCV_ISA_EXT_MAX), >> > }; >> > >> > diff --git a/arch/riscv/kernel/cpufeature.c b/arch/riscv/kernel/cpufeature.c >> > index 5c5e6e7488ce..0e997fa5524a 100644 >> > --- a/arch/riscv/kernel/cpufeature.c >> > +++ b/arch/riscv/kernel/cpufeature.c >> > @@ -200,6 +200,7 @@ void __init riscv_fill_hwcap(void) >> > set_bit(*ext - 'a', this_isa); >> > } else { >> > SET_ISA_EXT_MAP("svpbmt", RISCV_ISA_EXT_SVPBMT); >> > + SET_ISA_EXT_MAP("zicbom", RISCV_ISA_EXT_ZICBOM); >> > } >> > #undef SET_ISA_EXT_MAP >> > } >> > @@ -267,11 +268,27 @@ static bool __init_or_module cpufeature_svpbmt_check_func(unsigned int stage) >> > return ret; >> > } >> > >> > +static bool cpufeature_cmo_check_func(unsigned int stage) >> > +{ >> > + switch (stage) { >> > + case RISCV_ALTERNATIVES_EARLY_BOOT: >> > + return false; >> > + default: >> > + return riscv_isa_extension_available(NULL, ZICBOM); >> > + } >> > + >> > + return false; >> > +} >> > + >> > static const struct cpufeature_info __initdata_or_module cpufeature_list[CPUFEATURE_NUMBER] = { >> > { >> > .name = "svpbmt", >> > .check_func = cpufeature_svpbmt_check_func >> > }, >> > + { >> > + .name = "cmo", >> > + .check_func = cpufeature_cmo_check_func >> > + }, >> > }; >> > >> > static u32 __init_or_module cpufeature_probe(unsigned int stage) >> > diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile >> > index ac7a25298a04..d76aabf4b94d 100644 >> > --- a/arch/riscv/mm/Makefile >> > +++ b/arch/riscv/mm/Makefile >> > @@ -30,3 +30,4 @@ endif >> > endif >> > >> > obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o >> > +obj-$(CONFIG_RISCV_DMA_NONCOHERENT) += dma-noncoherent.o >> > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c >> > new file mode 100644 >> > index 000000000000..2c124bcc1932 >> > --- /dev/null >> > +++ b/arch/riscv/mm/dma-noncoherent.c >> > @@ -0,0 +1,61 @@ >> > +// SPDX-License-Identifier: GPL-2.0-only >> > +/* >> > + * RISC-V specific functions to support DMA for non-coherent devices >> > + * >> > + * Copyright (c) 2021 Western Digital Corporation or its affiliates. >> > + */ >> > + >> > +#include <linux/dma-direct.h> >> > +#include <linux/dma-map-ops.h> >> > +#include <linux/init.h> >> > +#include <linux/io.h> >> > +#include <linux/libfdt.h> >> > +#include <linux/mm.h> >> > +#include <linux/of.h> >> > +#include <linux/of_fdt.h> >> > + >> > +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) >> > +{ >> > + switch (dir) { >> > + case DMA_TO_DEVICE: >> > + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + case DMA_FROM_DEVICE: >> > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + case DMA_BIDIRECTIONAL: >> > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + default: >> > + break; >> > + } >> > +} >> > + >> > +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) >> > +{ >> > + switch (dir) { >> > + case DMA_TO_DEVICE: >> > + break; >> > + case DMA_FROM_DEVICE: >> > + case DMA_BIDIRECTIONAL: >> > + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); >> > + break; >> > + default: >> > + break; >> > + } >> > +} >> > + >> > +void arch_dma_prep_coherent(struct page *page, size_t size) >> > +{ >> > + void *flush_addr = page_address(page); >> > + >> > + memset(flush_addr, 0, size); >> > + ALT_CMO_OP(FLUSH, (unsigned long)flush_addr, size); >> > +} >> > + >> > +void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, >> > + const struct iommu_ops *iommu, bool coherent) >> > +{ >> > + /* If a specific device is dma-coherent, set it here */ >> > + dev->dma_coherent = coherent; >> > +} >> > -- >> > 2.30.2 >> > >> >> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations 2022-03-07 22:46 ` Heiko Stuebner @ 2022-03-31 10:07 ` Christoph Hellwig -1 siblings, 0 replies; 50+ messages in thread From: Christoph Hellwig @ 2022-03-31 10:07 UTC (permalink / raw) To: Heiko Stuebner Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Christoph Hellwig, Atish Patra I somehow only got patch 1 out of 2, which makes reviewing this very hard. ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 1/2] riscv: Implement Zicbom-based cache management operations @ 2022-03-31 10:07 ` Christoph Hellwig 0 siblings, 0 replies; 50+ messages in thread From: Christoph Hellwig @ 2022-03-31 10:07 UTC (permalink / raw) To: Heiko Stuebner Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Christoph Hellwig, Atish Patra I somehow only got patch 1 out of 2, which makes reviewing this very hard. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs 2022-03-07 22:46 ` Heiko Stuebner @ 2022-03-07 22:46 ` Heiko Stuebner -1 siblings, 0 replies; 50+ messages in thread From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw) To: palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Heiko Stuebner The T-Head C906 and C910 implement a scheme for handling cache operations different from the generic Zicbom extension. Add an errata for it next to the generic dma coherency ops. Signed-off-by: Heiko Stuebner <heiko@sntech.de> --- arch/riscv/Kconfig.erratas | 10 +++++++ arch/riscv/errata/thead/errata.c | 5 ++++ arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- 3 files changed, 57 insertions(+), 3 deletions(-) diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas index de4002baa1d0..89a6dcb8ac2a 100644 --- a/arch/riscv/Kconfig.erratas +++ b/arch/riscv/Kconfig.erratas @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT If you don't know what to do here, say "Y". +config ERRATA_THEAD_CMO + bool "Apply T-Head cache management errata" + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT + default y + help + This will apply the cache management errata to handle the + non-standard handling on non-coherent operations on T-Head SoCs. + + If you don't know what to do here, say "Y". + endmenu diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c index fd8e0538a3f0..11c26c37425f 100644 --- a/arch/riscv/errata/thead/errata.c +++ b/arch/riscv/errata/thead/errata.c @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { .stage = RISCV_ALTERNATIVES_EARLY_BOOT, .check_func = errata_mt_check_func }, + { + .name = "cache-management", + .stage = RISCV_ALTERNATIVES_BOOT, + .check_func = errata_mt_check_func + }, }; static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h index 7a2dd61af24d..f7c6805daeab 100644 --- a/arch/riscv/include/asm/errata_list.h +++ b/arch/riscv/include/asm/errata_list.h @@ -16,7 +16,8 @@ #ifdef CONFIG_ERRATA_THEAD #define ERRATA_THEAD_PBMT 0 -#define ERRATA_THEAD_NUMBER 1 +#define ERRATA_THEAD_CMO 1 +#define ERRATA_THEAD_NUMBER 2 #endif #define CPUFEATURE_SVPBMT 0 @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ #define CBO_CLEAN_A0 ".long 0x25200F" #define CBO_FLUSH_A0 ".long 0x05200F" +/* + * dcache.ipa rs1 (invalidate, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01010 rs1 000 00000 0001011 + * dache.iva rs1 (invalida, virtual address) + * 0000001 00110 rs1 000 00000 0001011 + * + * dcache.cpa rs1 (clean, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01001 rs1 000 00000 0001011 + * dcache.cva rs1 (clean, virtual address) + * 0000001 00100 rs1 000 00000 0001011 + * + * dcache.cipa rs1 (clean then invalidate, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01011 rs1 000 00000 0001011 + * dcache.civa rs1 (... virtual address) + * 0000001 00111 rs1 000 00000 0001011 + * + * sync.s (make sure all cache operations finished) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000000 11001 00000 000 00000 0001011 + */ +#define THEAD_INVAL_A0 ".long 0x0265000b" +#define THEAD_CLEAN_A0 ".long 0x0245000b" +#define THEAD_FLUSH_A0 ".long 0x0275000b" +#define THEAD_SYNC_S ".long 0x0190000b" + #define ALT_CMO_OP(_op, _start, _size) \ -asm volatile(ALTERNATIVE( \ +asm volatile(ALTERNATIVE_2( \ + "nop\n\t" \ "nop\n\t" \ "nop\n\t" \ "nop\n\t" \ @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ CBO_##_op##_A0 "\n\t" \ "addi a0, a0, %0\n\t" \ "2:\n\t" \ - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ + "bltu a0, %2, 3b\n\t" \ + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ + "mv a0, %1\n\t" \ + "j 2f\n\t" \ + "3:\n\t" \ + THEAD_##_op##_A0 "\n\t" \ + "addi a0, a0, %0\n\t" \ + "2:\n\t" \ + "bltu a0, %2, 3b\n\t" \ + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) -- 2.30.2 ^ permalink raw reply related [flat|nested] 50+ messages in thread
* [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs @ 2022-03-07 22:46 ` Heiko Stuebner 0 siblings, 0 replies; 50+ messages in thread From: Heiko Stuebner @ 2022-03-07 22:46 UTC (permalink / raw) To: palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, Heiko Stuebner The T-Head C906 and C910 implement a scheme for handling cache operations different from the generic Zicbom extension. Add an errata for it next to the generic dma coherency ops. Signed-off-by: Heiko Stuebner <heiko@sntech.de> --- arch/riscv/Kconfig.erratas | 10 +++++++ arch/riscv/errata/thead/errata.c | 5 ++++ arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- 3 files changed, 57 insertions(+), 3 deletions(-) diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas index de4002baa1d0..89a6dcb8ac2a 100644 --- a/arch/riscv/Kconfig.erratas +++ b/arch/riscv/Kconfig.erratas @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT If you don't know what to do here, say "Y". +config ERRATA_THEAD_CMO + bool "Apply T-Head cache management errata" + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT + default y + help + This will apply the cache management errata to handle the + non-standard handling on non-coherent operations on T-Head SoCs. + + If you don't know what to do here, say "Y". + endmenu diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c index fd8e0538a3f0..11c26c37425f 100644 --- a/arch/riscv/errata/thead/errata.c +++ b/arch/riscv/errata/thead/errata.c @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { .stage = RISCV_ALTERNATIVES_EARLY_BOOT, .check_func = errata_mt_check_func }, + { + .name = "cache-management", + .stage = RISCV_ALTERNATIVES_BOOT, + .check_func = errata_mt_check_func + }, }; static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h index 7a2dd61af24d..f7c6805daeab 100644 --- a/arch/riscv/include/asm/errata_list.h +++ b/arch/riscv/include/asm/errata_list.h @@ -16,7 +16,8 @@ #ifdef CONFIG_ERRATA_THEAD #define ERRATA_THEAD_PBMT 0 -#define ERRATA_THEAD_NUMBER 1 +#define ERRATA_THEAD_CMO 1 +#define ERRATA_THEAD_NUMBER 2 #endif #define CPUFEATURE_SVPBMT 0 @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ #define CBO_CLEAN_A0 ".long 0x25200F" #define CBO_FLUSH_A0 ".long 0x05200F" +/* + * dcache.ipa rs1 (invalidate, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01010 rs1 000 00000 0001011 + * dache.iva rs1 (invalida, virtual address) + * 0000001 00110 rs1 000 00000 0001011 + * + * dcache.cpa rs1 (clean, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01001 rs1 000 00000 0001011 + * dcache.cva rs1 (clean, virtual address) + * 0000001 00100 rs1 000 00000 0001011 + * + * dcache.cipa rs1 (clean then invalidate, physical address) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000001 01011 rs1 000 00000 0001011 + * dcache.civa rs1 (... virtual address) + * 0000001 00111 rs1 000 00000 0001011 + * + * sync.s (make sure all cache operations finished) + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | + * 0000000 11001 00000 000 00000 0001011 + */ +#define THEAD_INVAL_A0 ".long 0x0265000b" +#define THEAD_CLEAN_A0 ".long 0x0245000b" +#define THEAD_FLUSH_A0 ".long 0x0275000b" +#define THEAD_SYNC_S ".long 0x0190000b" + #define ALT_CMO_OP(_op, _start, _size) \ -asm volatile(ALTERNATIVE( \ +asm volatile(ALTERNATIVE_2( \ + "nop\n\t" \ "nop\n\t" \ "nop\n\t" \ "nop\n\t" \ @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ CBO_##_op##_A0 "\n\t" \ "addi a0, a0, %0\n\t" \ "2:\n\t" \ - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ + "bltu a0, %2, 3b\n\t" \ + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ + "mv a0, %1\n\t" \ + "j 2f\n\t" \ + "3:\n\t" \ + THEAD_##_op##_A0 "\n\t" \ + "addi a0, a0, %0\n\t" \ + "2:\n\t" \ + "bltu a0, %2, 3b\n\t" \ + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) -- 2.30.2 _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs 2022-03-07 22:46 ` Heiko Stuebner @ 2022-03-31 2:30 ` Palmer Dabbelt -1 siblings, 0 replies; 50+ messages in thread From: Palmer Dabbelt @ 2022-03-31 2:30 UTC (permalink / raw) To: heiko Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, heiko On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: > The T-Head C906 and C910 implement a scheme for handling > cache operations different from the generic Zicbom extension. > > Add an errata for it next to the generic dma coherency ops. > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > --- > arch/riscv/Kconfig.erratas | 10 +++++++ > arch/riscv/errata/thead/errata.c | 5 ++++ > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > 3 files changed, 57 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > index de4002baa1d0..89a6dcb8ac2a 100644 > --- a/arch/riscv/Kconfig.erratas > +++ b/arch/riscv/Kconfig.erratas > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > If you don't know what to do here, say "Y". > > +config ERRATA_THEAD_CMO > + bool "Apply T-Head cache management errata" > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > + default y > + help > + This will apply the cache management errata to handle the > + non-standard handling on non-coherent operations on T-Head SoCs. > + > + If you don't know what to do here, say "Y". > + > endmenu > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > index fd8e0538a3f0..11c26c37425f 100644 > --- a/arch/riscv/errata/thead/errata.c > +++ b/arch/riscv/errata/thead/errata.c > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > .check_func = errata_mt_check_func > }, > + { > + .name = "cache-management", > + .stage = RISCV_ALTERNATIVES_BOOT, > + .check_func = errata_mt_check_func > + }, > }; > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > index 7a2dd61af24d..f7c6805daeab 100644 > --- a/arch/riscv/include/asm/errata_list.h > +++ b/arch/riscv/include/asm/errata_list.h > @@ -16,7 +16,8 @@ > > #ifdef CONFIG_ERRATA_THEAD > #define ERRATA_THEAD_PBMT 0 > -#define ERRATA_THEAD_NUMBER 1 > +#define ERRATA_THEAD_CMO 1 > +#define ERRATA_THEAD_NUMBER 2 > #endif > > #define CPUFEATURE_SVPBMT 0 > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > #define CBO_CLEAN_A0 ".long 0x25200F" > #define CBO_FLUSH_A0 ".long 0x05200F" > > +/* > + * dcache.ipa rs1 (invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01010 rs1 000 00000 0001011 > + * dache.iva rs1 (invalida, virtual address) > + * 0000001 00110 rs1 000 00000 0001011 > + * > + * dcache.cpa rs1 (clean, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01001 rs1 000 00000 0001011 > + * dcache.cva rs1 (clean, virtual address) > + * 0000001 00100 rs1 000 00000 0001011 > + * > + * dcache.cipa rs1 (clean then invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01011 rs1 000 00000 0001011 > + * dcache.civa rs1 (... virtual address) > + * 0000001 00111 rs1 000 00000 0001011 > + * > + * sync.s (make sure all cache operations finished) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000000 11001 00000 000 00000 0001011 > + */ > +#define THEAD_INVAL_A0 ".long 0x0265000b" > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > +#define THEAD_SYNC_S ".long 0x0190000b" IIRC this came up before, but these really need to get into the assembler as actual instructions. > + > #define ALT_CMO_OP(_op, _start, _size) \ > -asm volatile(ALTERNATIVE( \ > +asm volatile(ALTERNATIVE_2( \ > + "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > CBO_##_op##_A0 "\n\t" \ > "addi a0, a0, %0\n\t" \ > "2:\n\t" \ > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > + "bltu a0, %2, 3b\n\t" \ > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > + "mv a0, %1\n\t" \ > + "j 2f\n\t" \ > + "3:\n\t" \ > + THEAD_##_op##_A0 "\n\t" \ > + "addi a0, a0, %0\n\t" \ > + "2:\n\t" \ > + "bltu a0, %2, 3b\n\t" \ > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs @ 2022-03-31 2:30 ` Palmer Dabbelt 0 siblings, 0 replies; 50+ messages in thread From: Palmer Dabbelt @ 2022-03-31 2:30 UTC (permalink / raw) To: heiko Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich, heiko On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: > The T-Head C906 and C910 implement a scheme for handling > cache operations different from the generic Zicbom extension. > > Add an errata for it next to the generic dma coherency ops. > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > --- > arch/riscv/Kconfig.erratas | 10 +++++++ > arch/riscv/errata/thead/errata.c | 5 ++++ > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > 3 files changed, 57 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > index de4002baa1d0..89a6dcb8ac2a 100644 > --- a/arch/riscv/Kconfig.erratas > +++ b/arch/riscv/Kconfig.erratas > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > If you don't know what to do here, say "Y". > > +config ERRATA_THEAD_CMO > + bool "Apply T-Head cache management errata" > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > + default y > + help > + This will apply the cache management errata to handle the > + non-standard handling on non-coherent operations on T-Head SoCs. > + > + If you don't know what to do here, say "Y". > + > endmenu > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > index fd8e0538a3f0..11c26c37425f 100644 > --- a/arch/riscv/errata/thead/errata.c > +++ b/arch/riscv/errata/thead/errata.c > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > .check_func = errata_mt_check_func > }, > + { > + .name = "cache-management", > + .stage = RISCV_ALTERNATIVES_BOOT, > + .check_func = errata_mt_check_func > + }, > }; > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > index 7a2dd61af24d..f7c6805daeab 100644 > --- a/arch/riscv/include/asm/errata_list.h > +++ b/arch/riscv/include/asm/errata_list.h > @@ -16,7 +16,8 @@ > > #ifdef CONFIG_ERRATA_THEAD > #define ERRATA_THEAD_PBMT 0 > -#define ERRATA_THEAD_NUMBER 1 > +#define ERRATA_THEAD_CMO 1 > +#define ERRATA_THEAD_NUMBER 2 > #endif > > #define CPUFEATURE_SVPBMT 0 > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > #define CBO_CLEAN_A0 ".long 0x25200F" > #define CBO_FLUSH_A0 ".long 0x05200F" > > +/* > + * dcache.ipa rs1 (invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01010 rs1 000 00000 0001011 > + * dache.iva rs1 (invalida, virtual address) > + * 0000001 00110 rs1 000 00000 0001011 > + * > + * dcache.cpa rs1 (clean, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01001 rs1 000 00000 0001011 > + * dcache.cva rs1 (clean, virtual address) > + * 0000001 00100 rs1 000 00000 0001011 > + * > + * dcache.cipa rs1 (clean then invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01011 rs1 000 00000 0001011 > + * dcache.civa rs1 (... virtual address) > + * 0000001 00111 rs1 000 00000 0001011 > + * > + * sync.s (make sure all cache operations finished) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000000 11001 00000 000 00000 0001011 > + */ > +#define THEAD_INVAL_A0 ".long 0x0265000b" > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > +#define THEAD_SYNC_S ".long 0x0190000b" IIRC this came up before, but these really need to get into the assembler as actual instructions. > + > #define ALT_CMO_OP(_op, _start, _size) \ > -asm volatile(ALTERNATIVE( \ > +asm volatile(ALTERNATIVE_2( \ > + "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > CBO_##_op##_A0 "\n\t" \ > "addi a0, a0, %0\n\t" \ > "2:\n\t" \ > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > + "bltu a0, %2, 3b\n\t" \ > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > + "mv a0, %1\n\t" \ > + "j 2f\n\t" \ > + "3:\n\t" \ > + THEAD_##_op##_A0 "\n\t" \ > + "addi a0, a0, %0\n\t" \ > + "2:\n\t" \ > + "bltu a0, %2, 3b\n\t" \ > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs 2022-03-31 2:30 ` Palmer Dabbelt @ 2022-03-31 8:22 ` Heiko Stübner -1 siblings, 0 replies; 50+ messages in thread From: Heiko Stübner @ 2022-03-31 8:22 UTC (permalink / raw) To: Palmer Dabbelt Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich Hi Palmer, Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt: > On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: > > The T-Head C906 and C910 implement a scheme for handling > > cache operations different from the generic Zicbom extension. > > > > Add an errata for it next to the generic dma coherency ops. > > > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > > --- > > arch/riscv/Kconfig.erratas | 10 +++++++ > > arch/riscv/errata/thead/errata.c | 5 ++++ > > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > > 3 files changed, 57 insertions(+), 3 deletions(-) > > > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > > index de4002baa1d0..89a6dcb8ac2a 100644 > > --- a/arch/riscv/Kconfig.erratas > > +++ b/arch/riscv/Kconfig.erratas > > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > > > If you don't know what to do here, say "Y". > > > > +config ERRATA_THEAD_CMO > > + bool "Apply T-Head cache management errata" > > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > > + default y > > + help > > + This will apply the cache management errata to handle the > > + non-standard handling on non-coherent operations on T-Head SoCs. > > + > > + If you don't know what to do here, say "Y". > > + > > endmenu > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > > index fd8e0538a3f0..11c26c37425f 100644 > > --- a/arch/riscv/errata/thead/errata.c > > +++ b/arch/riscv/errata/thead/errata.c > > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > > .check_func = errata_mt_check_func > > }, > > + { > > + .name = "cache-management", > > + .stage = RISCV_ALTERNATIVES_BOOT, > > + .check_func = errata_mt_check_func > > + }, > > }; > > > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > > index 7a2dd61af24d..f7c6805daeab 100644 > > --- a/arch/riscv/include/asm/errata_list.h > > +++ b/arch/riscv/include/asm/errata_list.h > > @@ -16,7 +16,8 @@ > > > > #ifdef CONFIG_ERRATA_THEAD > > #define ERRATA_THEAD_PBMT 0 > > -#define ERRATA_THEAD_NUMBER 1 > > +#define ERRATA_THEAD_CMO 1 > > +#define ERRATA_THEAD_NUMBER 2 > > #endif > > > > #define CPUFEATURE_SVPBMT 0 > > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > > #define CBO_CLEAN_A0 ".long 0x25200F" > > #define CBO_FLUSH_A0 ".long 0x05200F" > > > > +/* > > + * dcache.ipa rs1 (invalidate, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01010 rs1 000 00000 0001011 > > + * dache.iva rs1 (invalida, virtual address) > > + * 0000001 00110 rs1 000 00000 0001011 > > + * > > + * dcache.cpa rs1 (clean, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01001 rs1 000 00000 0001011 > > + * dcache.cva rs1 (clean, virtual address) > > + * 0000001 00100 rs1 000 00000 0001011 > > + * > > + * dcache.cipa rs1 (clean then invalidate, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01011 rs1 000 00000 0001011 > > + * dcache.civa rs1 (... virtual address) > > + * 0000001 00111 rs1 000 00000 0001011 > > + * > > + * sync.s (make sure all cache operations finished) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000000 11001 00000 000 00000 0001011 > > + */ > > +#define THEAD_INVAL_A0 ".long 0x0265000b" > > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > > +#define THEAD_SYNC_S ".long 0x0190000b" > > IIRC this came up before, but these really need to get into the > assembler as actual instructions. okay :-) . But just for my understanding which of the two ways going forward: - keep this in the waiting area _until_ a suitable binutils is released - use the coded instructions now and convert later once binutils is released The reason I ask is, that any chip with a t-head core like the Allwinner-D1 will need this for things like basic networking, so with the binutils release schedule, I guess we'd be looking at autumn 2022 at the earliest. Thanks Heiko > > + > > #define ALT_CMO_OP(_op, _start, _size) \ > > -asm volatile(ALTERNATIVE( \ > > +asm volatile(ALTERNATIVE_2( \ > > + "nop\n\t" \ > > "nop\n\t" \ > > "nop\n\t" \ > > "nop\n\t" \ > > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > > CBO_##_op##_A0 "\n\t" \ > > "addi a0, a0, %0\n\t" \ > > "2:\n\t" \ > > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > > + "bltu a0, %2, 3b\n\t" \ > > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > > + "mv a0, %1\n\t" \ > > + "j 2f\n\t" \ > > + "3:\n\t" \ > > + THEAD_##_op##_A0 "\n\t" \ > > + "addi a0, a0, %0\n\t" \ > > + "2:\n\t" \ > > + "bltu a0, %2, 3b\n\t" \ > > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs @ 2022-03-31 8:22 ` Heiko Stübner 0 siblings, 0 replies; 50+ messages in thread From: Heiko Stübner @ 2022-03-31 8:22 UTC (permalink / raw) To: Palmer Dabbelt Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich Hi Palmer, Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt: > On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: > > The T-Head C906 and C910 implement a scheme for handling > > cache operations different from the generic Zicbom extension. > > > > Add an errata for it next to the generic dma coherency ops. > > > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > > --- > > arch/riscv/Kconfig.erratas | 10 +++++++ > > arch/riscv/errata/thead/errata.c | 5 ++++ > > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > > 3 files changed, 57 insertions(+), 3 deletions(-) > > > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > > index de4002baa1d0..89a6dcb8ac2a 100644 > > --- a/arch/riscv/Kconfig.erratas > > +++ b/arch/riscv/Kconfig.erratas > > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > > > If you don't know what to do here, say "Y". > > > > +config ERRATA_THEAD_CMO > > + bool "Apply T-Head cache management errata" > > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > > + default y > > + help > > + This will apply the cache management errata to handle the > > + non-standard handling on non-coherent operations on T-Head SoCs. > > + > > + If you don't know what to do here, say "Y". > > + > > endmenu > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > > index fd8e0538a3f0..11c26c37425f 100644 > > --- a/arch/riscv/errata/thead/errata.c > > +++ b/arch/riscv/errata/thead/errata.c > > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > > .check_func = errata_mt_check_func > > }, > > + { > > + .name = "cache-management", > > + .stage = RISCV_ALTERNATIVES_BOOT, > > + .check_func = errata_mt_check_func > > + }, > > }; > > > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > > index 7a2dd61af24d..f7c6805daeab 100644 > > --- a/arch/riscv/include/asm/errata_list.h > > +++ b/arch/riscv/include/asm/errata_list.h > > @@ -16,7 +16,8 @@ > > > > #ifdef CONFIG_ERRATA_THEAD > > #define ERRATA_THEAD_PBMT 0 > > -#define ERRATA_THEAD_NUMBER 1 > > +#define ERRATA_THEAD_CMO 1 > > +#define ERRATA_THEAD_NUMBER 2 > > #endif > > > > #define CPUFEATURE_SVPBMT 0 > > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > > #define CBO_CLEAN_A0 ".long 0x25200F" > > #define CBO_FLUSH_A0 ".long 0x05200F" > > > > +/* > > + * dcache.ipa rs1 (invalidate, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01010 rs1 000 00000 0001011 > > + * dache.iva rs1 (invalida, virtual address) > > + * 0000001 00110 rs1 000 00000 0001011 > > + * > > + * dcache.cpa rs1 (clean, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01001 rs1 000 00000 0001011 > > + * dcache.cva rs1 (clean, virtual address) > > + * 0000001 00100 rs1 000 00000 0001011 > > + * > > + * dcache.cipa rs1 (clean then invalidate, physical address) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000001 01011 rs1 000 00000 0001011 > > + * dcache.civa rs1 (... virtual address) > > + * 0000001 00111 rs1 000 00000 0001011 > > + * > > + * sync.s (make sure all cache operations finished) > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > + * 0000000 11001 00000 000 00000 0001011 > > + */ > > +#define THEAD_INVAL_A0 ".long 0x0265000b" > > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > > +#define THEAD_SYNC_S ".long 0x0190000b" > > IIRC this came up before, but these really need to get into the > assembler as actual instructions. okay :-) . But just for my understanding which of the two ways going forward: - keep this in the waiting area _until_ a suitable binutils is released - use the coded instructions now and convert later once binutils is released The reason I ask is, that any chip with a t-head core like the Allwinner-D1 will need this for things like basic networking, so with the binutils release schedule, I guess we'd be looking at autumn 2022 at the earliest. Thanks Heiko > > + > > #define ALT_CMO_OP(_op, _start, _size) \ > > -asm volatile(ALTERNATIVE( \ > > +asm volatile(ALTERNATIVE_2( \ > > + "nop\n\t" \ > > "nop\n\t" \ > > "nop\n\t" \ > > "nop\n\t" \ > > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > > CBO_##_op##_A0 "\n\t" \ > > "addi a0, a0, %0\n\t" \ > > "2:\n\t" \ > > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > > + "bltu a0, %2, 3b\n\t" \ > > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > > + "mv a0, %1\n\t" \ > > + "j 2f\n\t" \ > > + "3:\n\t" \ > > + THEAD_##_op##_A0 "\n\t" \ > > + "addi a0, a0, %0\n\t" \ > > + "2:\n\t" \ > > + "bltu a0, %2, 3b\n\t" \ > > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs 2022-03-31 8:22 ` Heiko Stübner @ 2022-03-31 8:29 ` Philipp Tomsich -1 siblings, 0 replies; 50+ messages in thread From: Philipp Tomsich @ 2022-03-31 8:29 UTC (permalink / raw) To: Heiko Stübner Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner Palmer, Could you confirm that I correctly understood what you require: is it that a patch is on the binutils list? Philipp. On Thu, 31 Mar 2022 at 10:22, Heiko Stübner <heiko@sntech.de> wrote: > > Hi Palmer, > > Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt: > > On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: > > > The T-Head C906 and C910 implement a scheme for handling > > > cache operations different from the generic Zicbom extension. > > > > > > Add an errata for it next to the generic dma coherency ops. > > > > > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > > > --- > > > arch/riscv/Kconfig.erratas | 10 +++++++ > > > arch/riscv/errata/thead/errata.c | 5 ++++ > > > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > > > 3 files changed, 57 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > > > index de4002baa1d0..89a6dcb8ac2a 100644 > > > --- a/arch/riscv/Kconfig.erratas > > > +++ b/arch/riscv/Kconfig.erratas > > > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > > > > > If you don't know what to do here, say "Y". > > > > > > +config ERRATA_THEAD_CMO > > > + bool "Apply T-Head cache management errata" > > > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > > > + default y > > > + help > > > + This will apply the cache management errata to handle the > > > + non-standard handling on non-coherent operations on T-Head SoCs. > > > + > > > + If you don't know what to do here, say "Y". > > > + > > > endmenu > > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > > > index fd8e0538a3f0..11c26c37425f 100644 > > > --- a/arch/riscv/errata/thead/errata.c > > > +++ b/arch/riscv/errata/thead/errata.c > > > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > > > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > > > .check_func = errata_mt_check_func > > > }, > > > + { > > > + .name = "cache-management", > > > + .stage = RISCV_ALTERNATIVES_BOOT, > > > + .check_func = errata_mt_check_func > > > + }, > > > }; > > > > > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > > > index 7a2dd61af24d..f7c6805daeab 100644 > > > --- a/arch/riscv/include/asm/errata_list.h > > > +++ b/arch/riscv/include/asm/errata_list.h > > > @@ -16,7 +16,8 @@ > > > > > > #ifdef CONFIG_ERRATA_THEAD > > > #define ERRATA_THEAD_PBMT 0 > > > -#define ERRATA_THEAD_NUMBER 1 > > > +#define ERRATA_THEAD_CMO 1 > > > +#define ERRATA_THEAD_NUMBER 2 > > > #endif > > > > > > #define CPUFEATURE_SVPBMT 0 > > > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > > > #define CBO_CLEAN_A0 ".long 0x25200F" > > > #define CBO_FLUSH_A0 ".long 0x05200F" > > > > > > +/* > > > + * dcache.ipa rs1 (invalidate, physical address) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000001 01010 rs1 000 00000 0001011 > > > + * dache.iva rs1 (invalida, virtual address) > > > + * 0000001 00110 rs1 000 00000 0001011 > > > + * > > > + * dcache.cpa rs1 (clean, physical address) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000001 01001 rs1 000 00000 0001011 > > > + * dcache.cva rs1 (clean, virtual address) > > > + * 0000001 00100 rs1 000 00000 0001011 > > > + * > > > + * dcache.cipa rs1 (clean then invalidate, physical address) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000001 01011 rs1 000 00000 0001011 > > > + * dcache.civa rs1 (... virtual address) > > > + * 0000001 00111 rs1 000 00000 0001011 > > > + * > > > + * sync.s (make sure all cache operations finished) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000000 11001 00000 000 00000 0001011 > > > + */ > > > +#define THEAD_INVAL_A0 ".long 0x0265000b" > > > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > > > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > > > +#define THEAD_SYNC_S ".long 0x0190000b" > > > > IIRC this came up before, but these really need to get into the > > assembler as actual instructions. > > okay :-) . > > But just for my understanding which of the two ways going forward: > - keep this in the waiting area _until_ a suitable binutils is released > - use the coded instructions now and convert later once binutils is released > > The reason I ask is, that any chip with a t-head core like the Allwinner-D1 > will need this for things like basic networking, so with the binutils > release schedule, I guess we'd be looking at autumn 2022 at the earliest. > > > Thanks > Heiko > > > > + > > > #define ALT_CMO_OP(_op, _start, _size) \ > > > -asm volatile(ALTERNATIVE( \ > > > +asm volatile(ALTERNATIVE_2( \ > > > + "nop\n\t" \ > > > "nop\n\t" \ > > > "nop\n\t" \ > > > "nop\n\t" \ > > > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > > > CBO_##_op##_A0 "\n\t" \ > > > "addi a0, a0, %0\n\t" \ > > > "2:\n\t" \ > > > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > > > + "bltu a0, %2, 3b\n\t" \ > > > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > > > + "mv a0, %1\n\t" \ > > > + "j 2f\n\t" \ > > > + "3:\n\t" \ > > > + THEAD_##_op##_A0 "\n\t" \ > > > + "addi a0, a0, %0\n\t" \ > > > + "2:\n\t" \ > > > + "bltu a0, %2, 3b\n\t" \ > > > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > > > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > > > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > > > > > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs @ 2022-03-31 8:29 ` Philipp Tomsich 0 siblings, 0 replies; 50+ messages in thread From: Philipp Tomsich @ 2022-03-31 8:29 UTC (permalink / raw) To: Heiko Stübner Cc: Palmer Dabbelt, Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner Palmer, Could you confirm that I correctly understood what you require: is it that a patch is on the binutils list? Philipp. On Thu, 31 Mar 2022 at 10:22, Heiko Stübner <heiko@sntech.de> wrote: > > Hi Palmer, > > Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt: > > On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: > > > The T-Head C906 and C910 implement a scheme for handling > > > cache operations different from the generic Zicbom extension. > > > > > > Add an errata for it next to the generic dma coherency ops. > > > > > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> > > > --- > > > arch/riscv/Kconfig.erratas | 10 +++++++ > > > arch/riscv/errata/thead/errata.c | 5 ++++ > > > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > > > 3 files changed, 57 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > > > index de4002baa1d0..89a6dcb8ac2a 100644 > > > --- a/arch/riscv/Kconfig.erratas > > > +++ b/arch/riscv/Kconfig.erratas > > > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > > > > > If you don't know what to do here, say "Y". > > > > > > +config ERRATA_THEAD_CMO > > > + bool "Apply T-Head cache management errata" > > > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > > > + default y > > > + help > > > + This will apply the cache management errata to handle the > > > + non-standard handling on non-coherent operations on T-Head SoCs. > > > + > > > + If you don't know what to do here, say "Y". > > > + > > > endmenu > > > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > > > index fd8e0538a3f0..11c26c37425f 100644 > > > --- a/arch/riscv/errata/thead/errata.c > > > +++ b/arch/riscv/errata/thead/errata.c > > > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > > > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > > > .check_func = errata_mt_check_func > > > }, > > > + { > > > + .name = "cache-management", > > > + .stage = RISCV_ALTERNATIVES_BOOT, > > > + .check_func = errata_mt_check_func > > > + }, > > > }; > > > > > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > > > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > > > index 7a2dd61af24d..f7c6805daeab 100644 > > > --- a/arch/riscv/include/asm/errata_list.h > > > +++ b/arch/riscv/include/asm/errata_list.h > > > @@ -16,7 +16,8 @@ > > > > > > #ifdef CONFIG_ERRATA_THEAD > > > #define ERRATA_THEAD_PBMT 0 > > > -#define ERRATA_THEAD_NUMBER 1 > > > +#define ERRATA_THEAD_CMO 1 > > > +#define ERRATA_THEAD_NUMBER 2 > > > #endif > > > > > > #define CPUFEATURE_SVPBMT 0 > > > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > > > #define CBO_CLEAN_A0 ".long 0x25200F" > > > #define CBO_FLUSH_A0 ".long 0x05200F" > > > > > > +/* > > > + * dcache.ipa rs1 (invalidate, physical address) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000001 01010 rs1 000 00000 0001011 > > > + * dache.iva rs1 (invalida, virtual address) > > > + * 0000001 00110 rs1 000 00000 0001011 > > > + * > > > + * dcache.cpa rs1 (clean, physical address) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000001 01001 rs1 000 00000 0001011 > > > + * dcache.cva rs1 (clean, virtual address) > > > + * 0000001 00100 rs1 000 00000 0001011 > > > + * > > > + * dcache.cipa rs1 (clean then invalidate, physical address) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000001 01011 rs1 000 00000 0001011 > > > + * dcache.civa rs1 (... virtual address) > > > + * 0000001 00111 rs1 000 00000 0001011 > > > + * > > > + * sync.s (make sure all cache operations finished) > > > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > > > + * 0000000 11001 00000 000 00000 0001011 > > > + */ > > > +#define THEAD_INVAL_A0 ".long 0x0265000b" > > > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > > > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > > > +#define THEAD_SYNC_S ".long 0x0190000b" > > > > IIRC this came up before, but these really need to get into the > > assembler as actual instructions. > > okay :-) . > > But just for my understanding which of the two ways going forward: > - keep this in the waiting area _until_ a suitable binutils is released > - use the coded instructions now and convert later once binutils is released > > The reason I ask is, that any chip with a t-head core like the Allwinner-D1 > will need this for things like basic networking, so with the binutils > release schedule, I guess we'd be looking at autumn 2022 at the earliest. > > > Thanks > Heiko > > > > + > > > #define ALT_CMO_OP(_op, _start, _size) \ > > > -asm volatile(ALTERNATIVE( \ > > > +asm volatile(ALTERNATIVE_2( \ > > > + "nop\n\t" \ > > > "nop\n\t" \ > > > "nop\n\t" \ > > > "nop\n\t" \ > > > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > > > CBO_##_op##_A0 "\n\t" \ > > > "addi a0, a0, %0\n\t" \ > > > "2:\n\t" \ > > > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > > > + "bltu a0, %2, 3b\n\t" \ > > > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > > > + "mv a0, %1\n\t" \ > > > + "j 2f\n\t" \ > > > + "3:\n\t" \ > > > + THEAD_##_op##_A0 "\n\t" \ > > > + "addi a0, a0, %0\n\t" \ > > > + "2:\n\t" \ > > > + "bltu a0, %2, 3b\n\t" \ > > > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > > > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > > > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > > > > > > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs 2022-03-31 8:22 ` Heiko Stübner @ 2022-04-20 0:18 ` Palmer Dabbelt -1 siblings, 0 replies; 50+ messages in thread From: Palmer Dabbelt @ 2022-04-20 0:18 UTC (permalink / raw) To: heiko Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich On Thu, 31 Mar 2022 01:22:29 PDT (-0700), heiko@sntech.de wrote: > Hi Palmer, > > Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt: >> On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: >> > The T-Head C906 and C910 implement a scheme for handling >> > cache operations different from the generic Zicbom extension. >> > >> > Add an errata for it next to the generic dma coherency ops. >> > >> > Signed-off-by: Heiko Stuebner <heiko@sntech.de> >> > --- >> > arch/riscv/Kconfig.erratas | 10 +++++++ >> > arch/riscv/errata/thead/errata.c | 5 ++++ >> > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- >> > 3 files changed, 57 insertions(+), 3 deletions(-) >> > >> > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas >> > index de4002baa1d0..89a6dcb8ac2a 100644 >> > --- a/arch/riscv/Kconfig.erratas >> > +++ b/arch/riscv/Kconfig.erratas >> > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT >> > >> > If you don't know what to do here, say "Y". >> > >> > +config ERRATA_THEAD_CMO >> > + bool "Apply T-Head cache management errata" >> > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT >> > + default y >> > + help >> > + This will apply the cache management errata to handle the >> > + non-standard handling on non-coherent operations on T-Head SoCs. >> > + >> > + If you don't know what to do here, say "Y". >> > + >> > endmenu >> > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c >> > index fd8e0538a3f0..11c26c37425f 100644 >> > --- a/arch/riscv/errata/thead/errata.c >> > +++ b/arch/riscv/errata/thead/errata.c >> > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { >> > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, >> > .check_func = errata_mt_check_func >> > }, >> > + { >> > + .name = "cache-management", >> > + .stage = RISCV_ALTERNATIVES_BOOT, >> > + .check_func = errata_mt_check_func >> > + }, >> > }; >> > >> > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) >> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h >> > index 7a2dd61af24d..f7c6805daeab 100644 >> > --- a/arch/riscv/include/asm/errata_list.h >> > +++ b/arch/riscv/include/asm/errata_list.h >> > @@ -16,7 +16,8 @@ >> > >> > #ifdef CONFIG_ERRATA_THEAD >> > #define ERRATA_THEAD_PBMT 0 >> > -#define ERRATA_THEAD_NUMBER 1 >> > +#define ERRATA_THEAD_CMO 1 >> > +#define ERRATA_THEAD_NUMBER 2 >> > #endif >> > >> > #define CPUFEATURE_SVPBMT 0 >> > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ >> > #define CBO_CLEAN_A0 ".long 0x25200F" >> > #define CBO_FLUSH_A0 ".long 0x05200F" >> > >> > +/* >> > + * dcache.ipa rs1 (invalidate, physical address) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000001 01010 rs1 000 00000 0001011 >> > + * dache.iva rs1 (invalida, virtual address) >> > + * 0000001 00110 rs1 000 00000 0001011 >> > + * >> > + * dcache.cpa rs1 (clean, physical address) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000001 01001 rs1 000 00000 0001011 >> > + * dcache.cva rs1 (clean, virtual address) >> > + * 0000001 00100 rs1 000 00000 0001011 >> > + * >> > + * dcache.cipa rs1 (clean then invalidate, physical address) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000001 01011 rs1 000 00000 0001011 >> > + * dcache.civa rs1 (... virtual address) >> > + * 0000001 00111 rs1 000 00000 0001011 >> > + * >> > + * sync.s (make sure all cache operations finished) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000000 11001 00000 000 00000 0001011 >> > + */ >> > +#define THEAD_INVAL_A0 ".long 0x0265000b" >> > +#define THEAD_CLEAN_A0 ".long 0x0245000b" >> > +#define THEAD_FLUSH_A0 ".long 0x0275000b" >> > +#define THEAD_SYNC_S ".long 0x0190000b" >> >> IIRC this came up before, but these really need to get into the >> assembler as actual instructions. > > okay :-) . > > But just for my understanding which of the two ways going forward: > - keep this in the waiting area _until_ a suitable binutils is released > - use the coded instructions now and convert later once binutils is released > > The reason I ask is, that any chip with a t-head core like the Allwinner-D1 > will need this for things like basic networking, so with the binutils > release schedule, I guess we'd be looking at autumn 2022 at the earliest. I'm not the binutils release maintainer, so I can't really sign off on a release date, but give the history that sounds about right to me. I get it's a headache to have to have a toolchain that supports the ISA, but if it was really that important it would have made one of the last two releases -- I very specifically remember talking to the folks at the RISC-V foundation about this the better part of a year ago, but they decided to play at politics instead of being constructive so now we have two messes to clean up. I volunteered Patrick to send binutils patches for the T-Head cache control stuff (as I didn't have time to write it myself this weekend), it's only a dozen or so instructions and thus shouldn't take that long. At least that way we can get a rough consensus on how we're going to move forward with the toolchain support, which we really need before we're going to start depending on anything. Sorry you got pulled into all this. > Thanks > Heiko > >> > + >> > #define ALT_CMO_OP(_op, _start, _size) \ >> > -asm volatile(ALTERNATIVE( \ >> > +asm volatile(ALTERNATIVE_2( \ >> > + "nop\n\t" \ >> > "nop\n\t" \ >> > "nop\n\t" \ >> > "nop\n\t" \ >> > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ >> > CBO_##_op##_A0 "\n\t" \ >> > "addi a0, a0, %0\n\t" \ >> > "2:\n\t" \ >> > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ >> > + "bltu a0, %2, 3b\n\t" \ >> > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ >> > + "mv a0, %1\n\t" \ >> > + "j 2f\n\t" \ >> > + "3:\n\t" \ >> > + THEAD_##_op##_A0 "\n\t" \ >> > + "addi a0, a0, %0\n\t" \ >> > + "2:\n\t" \ >> > + "bltu a0, %2, 3b\n\t" \ >> > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ >> > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ >> > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) >> ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs @ 2022-04-20 0:18 ` Palmer Dabbelt 0 siblings, 0 replies; 50+ messages in thread From: Palmer Dabbelt @ 2022-04-20 0:18 UTC (permalink / raw) To: heiko Cc: Paul Walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich On Thu, 31 Mar 2022 01:22:29 PDT (-0700), heiko@sntech.de wrote: > Hi Palmer, > > Am Donnerstag, 31. März 2022, 04:30:36 CEST schrieb Palmer Dabbelt: >> On Mon, 07 Mar 2022 14:46:20 PST (-0800), heiko@sntech.de wrote: >> > The T-Head C906 and C910 implement a scheme for handling >> > cache operations different from the generic Zicbom extension. >> > >> > Add an errata for it next to the generic dma coherency ops. >> > >> > Signed-off-by: Heiko Stuebner <heiko@sntech.de> >> > --- >> > arch/riscv/Kconfig.erratas | 10 +++++++ >> > arch/riscv/errata/thead/errata.c | 5 ++++ >> > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- >> > 3 files changed, 57 insertions(+), 3 deletions(-) >> > >> > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas >> > index de4002baa1d0..89a6dcb8ac2a 100644 >> > --- a/arch/riscv/Kconfig.erratas >> > +++ b/arch/riscv/Kconfig.erratas >> > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT >> > >> > If you don't know what to do here, say "Y". >> > >> > +config ERRATA_THEAD_CMO >> > + bool "Apply T-Head cache management errata" >> > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT >> > + default y >> > + help >> > + This will apply the cache management errata to handle the >> > + non-standard handling on non-coherent operations on T-Head SoCs. >> > + >> > + If you don't know what to do here, say "Y". >> > + >> > endmenu >> > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c >> > index fd8e0538a3f0..11c26c37425f 100644 >> > --- a/arch/riscv/errata/thead/errata.c >> > +++ b/arch/riscv/errata/thead/errata.c >> > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { >> > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, >> > .check_func = errata_mt_check_func >> > }, >> > + { >> > + .name = "cache-management", >> > + .stage = RISCV_ALTERNATIVES_BOOT, >> > + .check_func = errata_mt_check_func >> > + }, >> > }; >> > >> > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) >> > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h >> > index 7a2dd61af24d..f7c6805daeab 100644 >> > --- a/arch/riscv/include/asm/errata_list.h >> > +++ b/arch/riscv/include/asm/errata_list.h >> > @@ -16,7 +16,8 @@ >> > >> > #ifdef CONFIG_ERRATA_THEAD >> > #define ERRATA_THEAD_PBMT 0 >> > -#define ERRATA_THEAD_NUMBER 1 >> > +#define ERRATA_THEAD_CMO 1 >> > +#define ERRATA_THEAD_NUMBER 2 >> > #endif >> > >> > #define CPUFEATURE_SVPBMT 0 >> > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ >> > #define CBO_CLEAN_A0 ".long 0x25200F" >> > #define CBO_FLUSH_A0 ".long 0x05200F" >> > >> > +/* >> > + * dcache.ipa rs1 (invalidate, physical address) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000001 01010 rs1 000 00000 0001011 >> > + * dache.iva rs1 (invalida, virtual address) >> > + * 0000001 00110 rs1 000 00000 0001011 >> > + * >> > + * dcache.cpa rs1 (clean, physical address) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000001 01001 rs1 000 00000 0001011 >> > + * dcache.cva rs1 (clean, virtual address) >> > + * 0000001 00100 rs1 000 00000 0001011 >> > + * >> > + * dcache.cipa rs1 (clean then invalidate, physical address) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000001 01011 rs1 000 00000 0001011 >> > + * dcache.civa rs1 (... virtual address) >> > + * 0000001 00111 rs1 000 00000 0001011 >> > + * >> > + * sync.s (make sure all cache operations finished) >> > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | >> > + * 0000000 11001 00000 000 00000 0001011 >> > + */ >> > +#define THEAD_INVAL_A0 ".long 0x0265000b" >> > +#define THEAD_CLEAN_A0 ".long 0x0245000b" >> > +#define THEAD_FLUSH_A0 ".long 0x0275000b" >> > +#define THEAD_SYNC_S ".long 0x0190000b" >> >> IIRC this came up before, but these really need to get into the >> assembler as actual instructions. > > okay :-) . > > But just for my understanding which of the two ways going forward: > - keep this in the waiting area _until_ a suitable binutils is released > - use the coded instructions now and convert later once binutils is released > > The reason I ask is, that any chip with a t-head core like the Allwinner-D1 > will need this for things like basic networking, so with the binutils > release schedule, I guess we'd be looking at autumn 2022 at the earliest. I'm not the binutils release maintainer, so I can't really sign off on a release date, but give the history that sounds about right to me. I get it's a headache to have to have a toolchain that supports the ISA, but if it was really that important it would have made one of the last two releases -- I very specifically remember talking to the folks at the RISC-V foundation about this the better part of a year ago, but they decided to play at politics instead of being constructive so now we have two messes to clean up. I volunteered Patrick to send binutils patches for the T-Head cache control stuff (as I didn't have time to write it myself this weekend), it's only a dozen or so instructions and thus shouldn't take that long. At least that way we can get a rough consensus on how we're going to move forward with the toolchain support, which we really need before we're going to start depending on anything. Sorry you got pulled into all this. > Thanks > Heiko > >> > + >> > #define ALT_CMO_OP(_op, _start, _size) \ >> > -asm volatile(ALTERNATIVE( \ >> > +asm volatile(ALTERNATIVE_2( \ >> > + "nop\n\t" \ >> > "nop\n\t" \ >> > "nop\n\t" \ >> > "nop\n\t" \ >> > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ >> > CBO_##_op##_A0 "\n\t" \ >> > "addi a0, a0, %0\n\t" \ >> > "2:\n\t" \ >> > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ >> > + "bltu a0, %2, 3b\n\t" \ >> > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ >> > + "mv a0, %1\n\t" \ >> > + "j 2f\n\t" \ >> > + "3:\n\t" \ >> > + THEAD_##_op##_A0 "\n\t" \ >> > + "addi a0, a0, %0\n\t" \ >> > + "2:\n\t" \ >> > + "bltu a0, %2, 3b\n\t" \ >> > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ >> > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ >> > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) >> _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs 2022-03-07 22:46 ` Heiko Stuebner @ 2022-04-01 1:05 ` Samuel Holland -1 siblings, 0 replies; 50+ messages in thread From: Samuel Holland @ 2022-04-01 1:05 UTC (permalink / raw) To: Heiko Stuebner, palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich On 3/7/22 4:46 PM, Heiko Stuebner wrote: > The T-Head C906 and C910 implement a scheme for handling > cache operations different from the generic Zicbom extension. > > Add an errata for it next to the generic dma coherency ops. > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Samuel Holland <samuel@sholland.org> With this option disabled, MMC and USB are broken on D1 boards: [ 3.021326] Waiting for root device /dev/mmcblk0p1... [ 3.219727] usb 4-1: new full-speed USB device number 2 using ohci-platform [ 18.703736] usb 4-1: device descriptor read/64, error -110 With the option enabled, MMC, USB, and Ethernet all work fine. > --- > arch/riscv/Kconfig.erratas | 10 +++++++ > arch/riscv/errata/thead/errata.c | 5 ++++ > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > 3 files changed, 57 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > index de4002baa1d0..89a6dcb8ac2a 100644 > --- a/arch/riscv/Kconfig.erratas > +++ b/arch/riscv/Kconfig.erratas > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > If you don't know what to do here, say "Y". > > +config ERRATA_THEAD_CMO > + bool "Apply T-Head cache management errata" > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > + default y > + help > + This will apply the cache management errata to handle the > + non-standard handling on non-coherent operations on T-Head SoCs. > + > + If you don't know what to do here, say "Y". > + > endmenu > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > index fd8e0538a3f0..11c26c37425f 100644 > --- a/arch/riscv/errata/thead/errata.c > +++ b/arch/riscv/errata/thead/errata.c > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > .check_func = errata_mt_check_func > }, > + { > + .name = "cache-management", > + .stage = RISCV_ALTERNATIVES_BOOT, > + .check_func = errata_mt_check_func > + }, > }; > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > index 7a2dd61af24d..f7c6805daeab 100644 > --- a/arch/riscv/include/asm/errata_list.h > +++ b/arch/riscv/include/asm/errata_list.h > @@ -16,7 +16,8 @@ > > #ifdef CONFIG_ERRATA_THEAD > #define ERRATA_THEAD_PBMT 0 > -#define ERRATA_THEAD_NUMBER 1 > +#define ERRATA_THEAD_CMO 1 > +#define ERRATA_THEAD_NUMBER 2 > #endif > > #define CPUFEATURE_SVPBMT 0 > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > #define CBO_CLEAN_A0 ".long 0x25200F" > #define CBO_FLUSH_A0 ".long 0x05200F" > > +/* > + * dcache.ipa rs1 (invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01010 rs1 000 00000 0001011 > + * dache.iva rs1 (invalida, virtual address) > + * 0000001 00110 rs1 000 00000 0001011 > + * > + * dcache.cpa rs1 (clean, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01001 rs1 000 00000 0001011 > + * dcache.cva rs1 (clean, virtual address) > + * 0000001 00100 rs1 000 00000 0001011 > + * > + * dcache.cipa rs1 (clean then invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01011 rs1 000 00000 0001011 > + * dcache.civa rs1 (... virtual address) > + * 0000001 00111 rs1 000 00000 0001011 > + * > + * sync.s (make sure all cache operations finished) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000000 11001 00000 000 00000 0001011 > + */ > +#define THEAD_INVAL_A0 ".long 0x0265000b" > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > +#define THEAD_SYNC_S ".long 0x0190000b" > + > #define ALT_CMO_OP(_op, _start, _size) \ > -asm volatile(ALTERNATIVE( \ > +asm volatile(ALTERNATIVE_2( \ > + "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > CBO_##_op##_A0 "\n\t" \ > "addi a0, a0, %0\n\t" \ > "2:\n\t" \ > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > + "bltu a0, %2, 3b\n\t" \ > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > + "mv a0, %1\n\t" \ > + "j 2f\n\t" \ > + "3:\n\t" \ > + THEAD_##_op##_A0 "\n\t" \ > + "addi a0, a0, %0\n\t" \ > + "2:\n\t" \ > + "bltu a0, %2, 3b\n\t" \ > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs @ 2022-04-01 1:05 ` Samuel Holland 0 siblings, 0 replies; 50+ messages in thread From: Samuel Holland @ 2022-04-01 1:05 UTC (permalink / raw) To: Heiko Stuebner, palmer, paul.walmsley Cc: linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich On 3/7/22 4:46 PM, Heiko Stuebner wrote: > The T-Head C906 and C910 implement a scheme for handling > cache operations different from the generic Zicbom extension. > > Add an errata for it next to the generic dma coherency ops. > > Signed-off-by: Heiko Stuebner <heiko@sntech.de> Tested-by: Samuel Holland <samuel@sholland.org> With this option disabled, MMC and USB are broken on D1 boards: [ 3.021326] Waiting for root device /dev/mmcblk0p1... [ 3.219727] usb 4-1: new full-speed USB device number 2 using ohci-platform [ 18.703736] usb 4-1: device descriptor read/64, error -110 With the option enabled, MMC, USB, and Ethernet all work fine. > --- > arch/riscv/Kconfig.erratas | 10 +++++++ > arch/riscv/errata/thead/errata.c | 5 ++++ > arch/riscv/include/asm/errata_list.h | 45 ++++++++++++++++++++++++++-- > 3 files changed, 57 insertions(+), 3 deletions(-) > > diff --git a/arch/riscv/Kconfig.erratas b/arch/riscv/Kconfig.erratas > index de4002baa1d0..89a6dcb8ac2a 100644 > --- a/arch/riscv/Kconfig.erratas > +++ b/arch/riscv/Kconfig.erratas > @@ -50,4 +50,14 @@ config ERRATA_THEAD_PBMT > > If you don't know what to do here, say "Y". > > +config ERRATA_THEAD_CMO > + bool "Apply T-Head cache management errata" > + depends on ERRATA_THEAD && RISCV_DMA_NONCOHERENT > + default y > + help > + This will apply the cache management errata to handle the > + non-standard handling on non-coherent operations on T-Head SoCs. > + > + If you don't know what to do here, say "Y". > + > endmenu > diff --git a/arch/riscv/errata/thead/errata.c b/arch/riscv/errata/thead/errata.c > index fd8e0538a3f0..11c26c37425f 100644 > --- a/arch/riscv/errata/thead/errata.c > +++ b/arch/riscv/errata/thead/errata.c > @@ -33,6 +33,11 @@ static const struct errata_info errata_list[ERRATA_THEAD_NUMBER] = { > .stage = RISCV_ALTERNATIVES_EARLY_BOOT, > .check_func = errata_mt_check_func > }, > + { > + .name = "cache-management", > + .stage = RISCV_ALTERNATIVES_BOOT, > + .check_func = errata_mt_check_func > + }, > }; > > static u32 thead_errata_probe(unsigned int stage, unsigned long archid, unsigned long impid) > diff --git a/arch/riscv/include/asm/errata_list.h b/arch/riscv/include/asm/errata_list.h > index 7a2dd61af24d..f7c6805daeab 100644 > --- a/arch/riscv/include/asm/errata_list.h > +++ b/arch/riscv/include/asm/errata_list.h > @@ -16,7 +16,8 @@ > > #ifdef CONFIG_ERRATA_THEAD > #define ERRATA_THEAD_PBMT 0 > -#define ERRATA_THEAD_NUMBER 1 > +#define ERRATA_THEAD_CMO 1 > +#define ERRATA_THEAD_NUMBER 2 > #endif > > #define CPUFEATURE_SVPBMT 0 > @@ -104,8 +105,37 @@ asm volatile(ALTERNATIVE( \ > #define CBO_CLEAN_A0 ".long 0x25200F" > #define CBO_FLUSH_A0 ".long 0x05200F" > > +/* > + * dcache.ipa rs1 (invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01010 rs1 000 00000 0001011 > + * dache.iva rs1 (invalida, virtual address) > + * 0000001 00110 rs1 000 00000 0001011 > + * > + * dcache.cpa rs1 (clean, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01001 rs1 000 00000 0001011 > + * dcache.cva rs1 (clean, virtual address) > + * 0000001 00100 rs1 000 00000 0001011 > + * > + * dcache.cipa rs1 (clean then invalidate, physical address) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000001 01011 rs1 000 00000 0001011 > + * dcache.civa rs1 (... virtual address) > + * 0000001 00111 rs1 000 00000 0001011 > + * > + * sync.s (make sure all cache operations finished) > + * | 31 - 25 | 24 - 20 | 19 - 15 | 14 - 12 | 11 - 7 | 6 - 0 | > + * 0000000 11001 00000 000 00000 0001011 > + */ > +#define THEAD_INVAL_A0 ".long 0x0265000b" > +#define THEAD_CLEAN_A0 ".long 0x0245000b" > +#define THEAD_FLUSH_A0 ".long 0x0275000b" > +#define THEAD_SYNC_S ".long 0x0190000b" > + > #define ALT_CMO_OP(_op, _start, _size) \ > -asm volatile(ALTERNATIVE( \ > +asm volatile(ALTERNATIVE_2( \ > + "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > "nop\n\t" \ > @@ -117,7 +147,16 @@ asm volatile(ALTERNATIVE( \ > CBO_##_op##_A0 "\n\t" \ > "addi a0, a0, %0\n\t" \ > "2:\n\t" \ > - "bltu a0, %2, 3b\n\t", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT) \ > + "bltu a0, %2, 3b\n\t" \ > + "nop", 0, CPUFEATURE_CMO, CONFIG_RISCV_DMA_NONCOHERENT, \ > + "mv a0, %1\n\t" \ > + "j 2f\n\t" \ > + "3:\n\t" \ > + THEAD_##_op##_A0 "\n\t" \ > + "addi a0, a0, %0\n\t" \ > + "2:\n\t" \ > + "bltu a0, %2, 3b\n\t" \ > + THEAD_SYNC_S, THEAD_VENDOR_ID, ERRATA_THEAD_CMO, CONFIG_ERRATA_THEAD_CMO) \ > : : "I"(L1_CACHE_BYTES), "r"((_start) & ~(L1_CACHE_BYTES - 1)), \ > "r"(ALIGN((_start) + (_size), L1_CACHE_BYTES))) > > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-03-07 22:46 ` Heiko Stuebner @ 2022-04-15 11:26 ` Corentin Labbe -1 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-15 11:26 UTC (permalink / raw) To: Heiko Stuebner Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > This series is based on the alternatives changes done in my svpbmt series > and thus also depends on Atish's isa-extension parsing series. > > It implements using the cache-management instructions from the Zicbom- > extension to handle cache flush, etc actions on platforms needing them. > > SoCs using cpu cores from T-Head like the Allwinne D1 implement a > different set of cache instructions. But while they are different, > instructions they provide the same functionality, so a variant can > easly hook into the existing alternatives mechanism on those. > > Hello I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. I am hitting a buffer corruption problem with DMA. The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". In fact the buffer is not overran by device but by dma_map_single() operation. The following small code show the problem: dma_addr_t dma; u8 *buf; #define BSIZE 2048 #define DMASIZE 16 buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); for (i = 0; i < BSIZE; i++) buf[i] = 0xFE; print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE); print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); Will lead to: [ 2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe Even with no DMA action, the buffer is corrupted. Regards ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-15 11:26 ` Corentin Labbe 0 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-15 11:26 UTC (permalink / raw) To: Heiko Stuebner Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, samuel, cmuellner, philipp.tomsich Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > This series is based on the alternatives changes done in my svpbmt series > and thus also depends on Atish's isa-extension parsing series. > > It implements using the cache-management instructions from the Zicbom- > extension to handle cache flush, etc actions on platforms needing them. > > SoCs using cpu cores from T-Head like the Allwinne D1 implement a > different set of cache instructions. But while they are different, > instructions they provide the same functionality, so a variant can > easly hook into the existing alternatives mechanism on those. > > Hello I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. I am hitting a buffer corruption problem with DMA. The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". In fact the buffer is not overran by device but by dma_map_single() operation. The following small code show the problem: dma_addr_t dma; u8 *buf; #define BSIZE 2048 #define DMASIZE 16 buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); for (i = 0; i < BSIZE; i++) buf[i] = 0xFE; print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE); print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); Will lead to: [ 2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe [ 3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 [ 3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe [ 3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe Even with no DMA action, the buffer is corrupted. Regards _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-15 11:26 ` Corentin Labbe @ 2022-04-16 2:19 ` Samuel Holland -1 siblings, 0 replies; 50+ messages in thread From: Samuel Holland @ 2022-04-16 2:19 UTC (permalink / raw) To: Corentin Labbe, Heiko Stuebner Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich On 4/15/22 6:26 AM, Corentin Labbe wrote: > Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : >> This series is based on the alternatives changes done in my svpbmt series >> and thus also depends on Atish's isa-extension parsing series. >> >> It implements using the cache-management instructions from the Zicbom- >> extension to handle cache flush, etc actions on platforms needing them. >> >> SoCs using cpu cores from T-Head like the Allwinne D1 implement a >> different set of cache instructions. But while they are different, >> instructions they provide the same functionality, so a variant can >> easly hook into the existing alternatives mechanism on those. >> >> > > Hello > > I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > I am hitting a buffer corruption problem with DMA. > The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > In fact the buffer is not overran by device but by dma_map_single() operation. > > The following small code show the problem: > > dma_addr_t dma; > u8 *buf; > #define BSIZE 2048 > #define DMASIZE 16 > > buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > for (i = 0; i < BSIZE; i++) > buf[i] = 0xFE; > print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); This function (through dma_direct_map_page()) ends up calling arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's cache. This is the same thing other architectures do (at least arm, arm64, openrisc, and powerpc). So this appears to be working as intended. Regards, Samuel > dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE); > print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > Will lead to: > [ 2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe > > Even with no DMA action, the buffer is corrupted. > > Regards > ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-16 2:19 ` Samuel Holland 0 siblings, 0 replies; 50+ messages in thread From: Samuel Holland @ 2022-04-16 2:19 UTC (permalink / raw) To: Corentin Labbe, Heiko Stuebner Cc: palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich On 4/15/22 6:26 AM, Corentin Labbe wrote: > Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : >> This series is based on the alternatives changes done in my svpbmt series >> and thus also depends on Atish's isa-extension parsing series. >> >> It implements using the cache-management instructions from the Zicbom- >> extension to handle cache flush, etc actions on platforms needing them. >> >> SoCs using cpu cores from T-Head like the Allwinne D1 implement a >> different set of cache instructions. But while they are different, >> instructions they provide the same functionality, so a variant can >> easly hook into the existing alternatives mechanism on those. >> >> > > Hello > > I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > I am hitting a buffer corruption problem with DMA. > The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > In fact the buffer is not overran by device but by dma_map_single() operation. > > The following small code show the problem: > > dma_addr_t dma; > u8 *buf; > #define BSIZE 2048 > #define DMASIZE 16 > > buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > for (i = 0; i < BSIZE; i++) > buf[i] = 0xFE; > print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); This function (through dma_direct_map_page()) ends up calling arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's cache. This is the same thing other architectures do (at least arm, arm64, openrisc, and powerpc). So this appears to be working as intended. Regards, Samuel > dma_unmap_single(ce->dev, dma, DMASIZE, DMA_FROM_DEVICE); > print_hex_dump(KERN_INFO, "DMATEST3:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > Will lead to: > [ 2.960040] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.965354] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.970709] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.976069] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.981440] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.986814] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.992188] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 2.997560] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.002934] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.008307] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.013680] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.019054] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.024427] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.029802] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.035175] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.040546] DMATEST1:fefefefe fefefefe fefefefe fefefefe > [ 3.401647] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.406982] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.412350] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.417720] DMATEST3:a9c3a9c3 a9c3a9c3 a9c3a9c3 a9c3a9c3 > [ 3.423094] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.428468] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.433841] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.439213] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.444588] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.449962] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.455334] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.460707] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.466081] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.471454] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.476828] DMATEST3:fefefefe fefefefe fefefefe fefefefe > [ 3.482200] DMATEST3:fefefefe fefefefe fefefefe fefefefe > > Even with no DMA action, the buffer is corrupted. > > Regards > _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-16 2:19 ` Samuel Holland @ 2022-04-16 7:35 ` Corentin Labbe -1 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-16 7:35 UTC (permalink / raw) To: Samuel Holland Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > On 4/15/22 6:26 AM, Corentin Labbe wrote: > > Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > >> This series is based on the alternatives changes done in my svpbmt series > >> and thus also depends on Atish's isa-extension parsing series. > >> > >> It implements using the cache-management instructions from the Zicbom- > >> extension to handle cache flush, etc actions on platforms needing them. > >> > >> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > >> different set of cache instructions. But while they are different, > >> instructions they provide the same functionality, so a variant can > >> easly hook into the existing alternatives mechanism on those. > >> > >> > > > > Hello > > > > I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > I am hitting a buffer corruption problem with DMA. > > The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > In fact the buffer is not overran by device but by dma_map_single() operation. > > > > The following small code show the problem: > > > > dma_addr_t dma; > > u8 *buf; > > #define BSIZE 2048 > > #define DMASIZE 16 > > > > buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > for (i = 0; i < BSIZE; i++) > > buf[i] = 0xFE; > > print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > This function (through dma_direct_map_page()) ends up calling > arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > cache. This is the same thing other architectures do (at least arm, arm64, > openrisc, and powerpc). So this appears to be working as intended. > > Regards, > Samuel > This behavour is not present at least on ARM and ARM64. The sample code I provided does not corrupt the buffer on them. Regards ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-16 7:35 ` Corentin Labbe 0 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-16 7:35 UTC (permalink / raw) To: Samuel Holland Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > On 4/15/22 6:26 AM, Corentin Labbe wrote: > > Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > >> This series is based on the alternatives changes done in my svpbmt series > >> and thus also depends on Atish's isa-extension parsing series. > >> > >> It implements using the cache-management instructions from the Zicbom- > >> extension to handle cache flush, etc actions on platforms needing them. > >> > >> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > >> different set of cache instructions. But while they are different, > >> instructions they provide the same functionality, so a variant can > >> easly hook into the existing alternatives mechanism on those. > >> > >> > > > > Hello > > > > I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > I am hitting a buffer corruption problem with DMA. > > The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > In fact the buffer is not overran by device but by dma_map_single() operation. > > > > The following small code show the problem: > > > > dma_addr_t dma; > > u8 *buf; > > #define BSIZE 2048 > > #define DMASIZE 16 > > > > buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > for (i = 0; i < BSIZE; i++) > > buf[i] = 0xFE; > > print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > This function (through dma_direct_map_page()) ends up calling > arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > cache. This is the same thing other architectures do (at least arm, arm64, > openrisc, and powerpc). So this appears to be working as intended. > > Regards, > Samuel > This behavour is not present at least on ARM and ARM64. The sample code I provided does not corrupt the buffer on them. Regards _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-16 7:35 ` Corentin Labbe @ 2022-04-16 17:47 ` Samuel Holland -1 siblings, 0 replies; 50+ messages in thread From: Samuel Holland @ 2022-04-16 17:47 UTC (permalink / raw) To: Corentin Labbe Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich On 4/16/22 2:35 AM, Corentin Labbe wrote: > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : >> On 4/15/22 6:26 AM, Corentin Labbe wrote: >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : >>>> This series is based on the alternatives changes done in my svpbmt series >>>> and thus also depends on Atish's isa-extension parsing series. >>>> >>>> It implements using the cache-management instructions from the Zicbom- >>>> extension to handle cache flush, etc actions on platforms needing them. >>>> >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a >>>> different set of cache instructions. But while they are different, >>>> instructions they provide the same functionality, so a variant can >>>> easly hook into the existing alternatives mechanism on those. >>>> >>>> >>> >>> Hello >>> >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. >>> >>> I am hitting a buffer corruption problem with DMA. >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". >>> In fact the buffer is not overran by device but by dma_map_single() operation. >>> >>> The following small code show the problem: >>> >>> dma_addr_t dma; >>> u8 *buf; >>> #define BSIZE 2048 >>> #define DMASIZE 16 >>> >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); >>> for (i = 0; i < BSIZE; i++) >>> buf[i] = 0xFE; >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); >> >> This function (through dma_direct_map_page()) ends up calling >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's >> cache. This is the same thing other architectures do (at least arm, arm64, >> openrisc, and powerpc). So this appears to be working as intended. > > This behavour is not present at least on ARM and ARM64. > The sample code I provided does not corrupt the buffer on them. That can be explained by the 0xFE bytes having been flushed to DRAM already in your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in a dirty cache line. The cache topology and implementation is totally different across the SoCs, so this is not too surprising. Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a unidirectional DMA transfer from the device into that buffer. So the contents of the buffer are "undefined" until the DMA transfer completes. If you are also writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. Regards, Samuel _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-16 17:47 ` Samuel Holland 0 siblings, 0 replies; 50+ messages in thread From: Samuel Holland @ 2022-04-16 17:47 UTC (permalink / raw) To: Corentin Labbe Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich On 4/16/22 2:35 AM, Corentin Labbe wrote: > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : >> On 4/15/22 6:26 AM, Corentin Labbe wrote: >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : >>>> This series is based on the alternatives changes done in my svpbmt series >>>> and thus also depends on Atish's isa-extension parsing series. >>>> >>>> It implements using the cache-management instructions from the Zicbom- >>>> extension to handle cache flush, etc actions on platforms needing them. >>>> >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a >>>> different set of cache instructions. But while they are different, >>>> instructions they provide the same functionality, so a variant can >>>> easly hook into the existing alternatives mechanism on those. >>>> >>>> >>> >>> Hello >>> >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. >>> >>> I am hitting a buffer corruption problem with DMA. >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". >>> In fact the buffer is not overran by device but by dma_map_single() operation. >>> >>> The following small code show the problem: >>> >>> dma_addr_t dma; >>> u8 *buf; >>> #define BSIZE 2048 >>> #define DMASIZE 16 >>> >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); >>> for (i = 0; i < BSIZE; i++) >>> buf[i] = 0xFE; >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); >> >> This function (through dma_direct_map_page()) ends up calling >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's >> cache. This is the same thing other architectures do (at least arm, arm64, >> openrisc, and powerpc). So this appears to be working as intended. > > This behavour is not present at least on ARM and ARM64. > The sample code I provided does not corrupt the buffer on them. That can be explained by the 0xFE bytes having been flushed to DRAM already in your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in a dirty cache line. The cache topology and implementation is totally different across the SoCs, so this is not too surprising. Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a unidirectional DMA transfer from the device into that buffer. So the contents of the buffer are "undefined" until the DMA transfer completes. If you are also writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. Regards, Samuel ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-16 17:47 ` Samuel Holland @ 2022-04-16 19:32 ` Corentin Labbe -1 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-16 19:32 UTC (permalink / raw) To: Samuel Holland Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich, herbert, linux-crypto Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > >>>> This series is based on the alternatives changes done in my svpbmt series > >>>> and thus also depends on Atish's isa-extension parsing series. > >>>> > >>>> It implements using the cache-management instructions from the Zicbom- > >>>> extension to handle cache flush, etc actions on platforms needing them. > >>>> > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > >>>> different set of cache instructions. But while they are different, > >>>> instructions they provide the same functionality, so a variant can > >>>> easly hook into the existing alternatives mechanism on those. > >>>> > >>>> > >>> > >>> Hello > >>> > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > >>> > >>> I am hitting a buffer corruption problem with DMA. > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > >>> > >>> The following small code show the problem: > >>> > >>> dma_addr_t dma; > >>> u8 *buf; > >>> #define BSIZE 2048 > >>> #define DMASIZE 16 > >>> > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > >>> for (i = 0; i < BSIZE; i++) > >>> buf[i] = 0xFE; > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > >> > >> This function (through dma_direct_map_page()) ends up calling > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > >> cache. This is the same thing other architectures do (at least arm, arm64, > >> openrisc, and powerpc). So this appears to be working as intended. > > > > This behavour is not present at least on ARM and ARM64. > > The sample code I provided does not corrupt the buffer on them. > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > a dirty cache line. The cache topology and implementation is totally different > across the SoCs, so this is not too surprising. > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > unidirectional DMA transfer from the device into that buffer. So the contents of > the buffer are "undefined" until the DMA transfer completes. If you are also > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > Regards, > Samuel +CC crypto mailing list + maintainer My problem is that crypto selftest, for each buffer where I need to do a cipher operation, concat a poison buffer to check that device does write beyond buffer. But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. So you mean that on SoC D1, this crypto API check strategy is impossible ? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-16 19:32 ` Corentin Labbe 0 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-16 19:32 UTC (permalink / raw) To: Samuel Holland Cc: Heiko Stuebner, palmer, paul.walmsley, linux-riscv, linux-kernel, wefu, guoren, atishp, anup, mick, cmuellner, philipp.tomsich, herbert, linux-crypto Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > >>>> This series is based on the alternatives changes done in my svpbmt series > >>>> and thus also depends on Atish's isa-extension parsing series. > >>>> > >>>> It implements using the cache-management instructions from the Zicbom- > >>>> extension to handle cache flush, etc actions on platforms needing them. > >>>> > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > >>>> different set of cache instructions. But while they are different, > >>>> instructions they provide the same functionality, so a variant can > >>>> easly hook into the existing alternatives mechanism on those. > >>>> > >>>> > >>> > >>> Hello > >>> > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > >>> > >>> I am hitting a buffer corruption problem with DMA. > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > >>> > >>> The following small code show the problem: > >>> > >>> dma_addr_t dma; > >>> u8 *buf; > >>> #define BSIZE 2048 > >>> #define DMASIZE 16 > >>> > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > >>> for (i = 0; i < BSIZE; i++) > >>> buf[i] = 0xFE; > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > >> > >> This function (through dma_direct_map_page()) ends up calling > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > >> cache. This is the same thing other architectures do (at least arm, arm64, > >> openrisc, and powerpc). So this appears to be working as intended. > > > > This behavour is not present at least on ARM and ARM64. > > The sample code I provided does not corrupt the buffer on them. > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > a dirty cache line. The cache topology and implementation is totally different > across the SoCs, so this is not too surprising. > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > unidirectional DMA transfer from the device into that buffer. So the contents of > the buffer are "undefined" until the DMA transfer completes. If you are also > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > Regards, > Samuel +CC crypto mailing list + maintainer My problem is that crypto selftest, for each buffer where I need to do a cipher operation, concat a poison buffer to check that device does write beyond buffer. But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. So you mean that on SoC D1, this crypto API check strategy is impossible ? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-16 19:32 ` Corentin Labbe @ 2022-04-17 2:17 ` Guo Ren -1 siblings, 0 replies; 50+ messages in thread From: Guo Ren @ 2022-04-17 2:17 UTC (permalink / raw) To: Corentin Labbe Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > >>>> This series is based on the alternatives changes done in my svpbmt series > > >>>> and thus also depends on Atish's isa-extension parsing series. > > >>>> > > >>>> It implements using the cache-management instructions from the Zicbom- > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > >>>> > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > >>>> different set of cache instructions. But while they are different, > > >>>> instructions they provide the same functionality, so a variant can > > >>>> easly hook into the existing alternatives mechanism on those. > > >>>> > > >>>> > > >>> > > >>> Hello > > >>> > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > >>> > > >>> I am hitting a buffer corruption problem with DMA. > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > >>> > > >>> The following small code show the problem: > > >>> > > >>> dma_addr_t dma; > > >>> u8 *buf; > > >>> #define BSIZE 2048 > > >>> #define DMASIZE 16 > > >>> > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > >>> for (i = 0; i < BSIZE; i++) > > >>> buf[i] = 0xFE; > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > >> > > >> This function (through dma_direct_map_page()) ends up calling > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > This behavour is not present at least on ARM and ARM64. > > > The sample code I provided does not corrupt the buffer on them. > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > a dirty cache line. The cache topology and implementation is totally different > > across the SoCs, so this is not too surprising. > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > unidirectional DMA transfer from the device into that buffer. So the contents of > > the buffer are "undefined" until the DMA transfer completes. If you are also > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > Regards, > > Samuel > > +CC crypto mailing list + maintainer > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > concat a poison buffer to check that device does write beyond buffer. > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > So you mean that on SoC D1, this crypto API check strategy is impossible ? I think you could try to replace all CLEAN & INVAL ops with FLUSH ops for the testing. (All cache block-aligned data from the device for the CPU should be invalided.) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_FROM_DEVICE: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-17 2:17 ` Guo Ren 0 siblings, 0 replies; 50+ messages in thread From: Guo Ren @ 2022-04-17 2:17 UTC (permalink / raw) To: Corentin Labbe Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > >>>> This series is based on the alternatives changes done in my svpbmt series > > >>>> and thus also depends on Atish's isa-extension parsing series. > > >>>> > > >>>> It implements using the cache-management instructions from the Zicbom- > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > >>>> > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > >>>> different set of cache instructions. But while they are different, > > >>>> instructions they provide the same functionality, so a variant can > > >>>> easly hook into the existing alternatives mechanism on those. > > >>>> > > >>>> > > >>> > > >>> Hello > > >>> > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > >>> > > >>> I am hitting a buffer corruption problem with DMA. > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > >>> > > >>> The following small code show the problem: > > >>> > > >>> dma_addr_t dma; > > >>> u8 *buf; > > >>> #define BSIZE 2048 > > >>> #define DMASIZE 16 > > >>> > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > >>> for (i = 0; i < BSIZE; i++) > > >>> buf[i] = 0xFE; > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > >> > > >> This function (through dma_direct_map_page()) ends up calling > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > This behavour is not present at least on ARM and ARM64. > > > The sample code I provided does not corrupt the buffer on them. > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > a dirty cache line. The cache topology and implementation is totally different > > across the SoCs, so this is not too surprising. > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > unidirectional DMA transfer from the device into that buffer. So the contents of > > the buffer are "undefined" until the DMA transfer completes. If you are also > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > Regards, > > Samuel > > +CC crypto mailing list + maintainer > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > concat a poison buffer to check that device does write beyond buffer. > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > So you mean that on SoC D1, this crypto API check strategy is impossible ? I think you could try to replace all CLEAN & INVAL ops with FLUSH ops for the testing. (All cache block-aligned data from the device for the CPU should be invalided.) +void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_FROM_DEVICE: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} + +void arch_sync_dma_for_cpu(phys_addr_t paddr, size_t size, enum dma_data_direction dir) +{ + switch (dir) { + case DMA_TO_DEVICE: + break; + case DMA_FROM_DEVICE: + case DMA_BIDIRECTIONAL: + ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + break; + default: + break; + } +} -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/ _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-17 2:17 ` Guo Ren @ 2022-04-17 8:45 ` Corentin Labbe -1 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-17 8:45 UTC (permalink / raw) To: Guo Ren Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > <clabbe.montjoie@gmail.com> wrote: > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > >>>> > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > >>>> > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > >>>> different set of cache instructions. But while they are different, > > > >>>> instructions they provide the same functionality, so a variant can > > > >>>> easly hook into the existing alternatives mechanism on those. > > > >>>> > > > >>>> > > > >>> > > > >>> Hello > > > >>> > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > >>> > > > >>> I am hitting a buffer corruption problem with DMA. > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > >>> > > > >>> The following small code show the problem: > > > >>> > > > >>> dma_addr_t dma; > > > >>> u8 *buf; > > > >>> #define BSIZE 2048 > > > >>> #define DMASIZE 16 > > > >>> > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > >>> for (i = 0; i < BSIZE; i++) > > > >>> buf[i] = 0xFE; > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > >> > > > >> This function (through dma_direct_map_page()) ends up calling > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > a dirty cache line. The cache topology and implementation is totally different > > > across the SoCs, so this is not too surprising. > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > Regards, > > > Samuel > > > > +CC crypto mailing list + maintainer > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > concat a poison buffer to check that device does write beyond buffer. > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > for the testing. (All cache block-aligned data from the device for the > CPU should be invalided.) > With: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 2c124bcc1932..608483522e05 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); The crypto self test works and I got no more buffer corruption. Thanks ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-17 8:45 ` Corentin Labbe 0 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-17 8:45 UTC (permalink / raw) To: Guo Ren Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > <clabbe.montjoie@gmail.com> wrote: > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > >>>> > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > >>>> > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > >>>> different set of cache instructions. But while they are different, > > > >>>> instructions they provide the same functionality, so a variant can > > > >>>> easly hook into the existing alternatives mechanism on those. > > > >>>> > > > >>>> > > > >>> > > > >>> Hello > > > >>> > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > >>> > > > >>> I am hitting a buffer corruption problem with DMA. > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > >>> > > > >>> The following small code show the problem: > > > >>> > > > >>> dma_addr_t dma; > > > >>> u8 *buf; > > > >>> #define BSIZE 2048 > > > >>> #define DMASIZE 16 > > > >>> > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > >>> for (i = 0; i < BSIZE; i++) > > > >>> buf[i] = 0xFE; > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > >> > > > >> This function (through dma_direct_map_page()) ends up calling > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > a dirty cache line. The cache topology and implementation is totally different > > > across the SoCs, so this is not too surprising. > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > Regards, > > > Samuel > > > > +CC crypto mailing list + maintainer > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > concat a poison buffer to check that device does write beyond buffer. > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > for the testing. (All cache block-aligned data from the device for the > CPU should be invalided.) > With: diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c index 2c124bcc1932..608483522e05 100644 --- a/arch/riscv/mm/dma-noncoherent.c +++ b/arch/riscv/mm/dma-noncoherent.c @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); break; case DMA_FROM_DEVICE: - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); break; case DMA_BIDIRECTIONAL: ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); The crypto self test works and I got no more buffer corruption. Thanks _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-17 8:45 ` Corentin Labbe @ 2022-04-17 8:49 ` Guo Ren -1 siblings, 0 replies; 50+ messages in thread From: Guo Ren @ 2022-04-17 8:49 UTC (permalink / raw) To: Corentin Labbe Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > <clabbe.montjoie@gmail.com> wrote: > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > >>>> > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > >>>> > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > >>>> different set of cache instructions. But while they are different, > > > > >>>> instructions they provide the same functionality, so a variant can > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > >>>> > > > > >>>> > > > > >>> > > > > >>> Hello > > > > >>> > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > >>> > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > >>> > > > > >>> The following small code show the problem: > > > > >>> > > > > >>> dma_addr_t dma; > > > > >>> u8 *buf; > > > > >>> #define BSIZE 2048 > > > > >>> #define DMASIZE 16 > > > > >>> > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > >>> for (i = 0; i < BSIZE; i++) > > > > >>> buf[i] = 0xFE; > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > >> > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > a dirty cache line. The cache topology and implementation is totally different > > > > across the SoCs, so this is not too surprising. > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > Regards, > > > > Samuel > > > > > > +CC crypto mailing list + maintainer > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > for the testing. (All cache block-aligned data from the device for the > > CPU should be invalided.) > > > > With: > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 2c124bcc1932..608483522e05 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > The crypto self test works and I got no more buffer corruption. No, No ... it's not a solution. That means your driver has a problem. From device, we only need INVAL enough. > > Thanks -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-17 8:49 ` Guo Ren 0 siblings, 0 replies; 50+ messages in thread From: Guo Ren @ 2022-04-17 8:49 UTC (permalink / raw) To: Corentin Labbe Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > <clabbe.montjoie@gmail.com> wrote: > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > >>>> > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > >>>> > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > >>>> different set of cache instructions. But while they are different, > > > > >>>> instructions they provide the same functionality, so a variant can > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > >>>> > > > > >>>> > > > > >>> > > > > >>> Hello > > > > >>> > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > >>> > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > >>> > > > > >>> The following small code show the problem: > > > > >>> > > > > >>> dma_addr_t dma; > > > > >>> u8 *buf; > > > > >>> #define BSIZE 2048 > > > > >>> #define DMASIZE 16 > > > > >>> > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > >>> for (i = 0; i < BSIZE; i++) > > > > >>> buf[i] = 0xFE; > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > >> > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > a dirty cache line. The cache topology and implementation is totally different > > > > across the SoCs, so this is not too surprising. > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > Regards, > > > > Samuel > > > > > > +CC crypto mailing list + maintainer > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > for the testing. (All cache block-aligned data from the device for the > > CPU should be invalided.) > > > > With: > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > index 2c124bcc1932..608483522e05 100644 > --- a/arch/riscv/mm/dma-noncoherent.c > +++ b/arch/riscv/mm/dma-noncoherent.c > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > break; > case DMA_FROM_DEVICE: > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > break; > case DMA_BIDIRECTIONAL: > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > The crypto self test works and I got no more buffer corruption. No, No ... it's not a solution. That means your driver has a problem. From device, we only need INVAL enough. > > Thanks -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/ _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-17 8:49 ` Guo Ren @ 2022-04-17 17:35 ` Corentin Labbe -1 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-17 17:35 UTC (permalink / raw) To: Guo Ren Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > <clabbe.montjoie@gmail.com> wrote: > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > >>>> > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > >>>> > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > >>>> different set of cache instructions. But while they are different, > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > >>>> > > > > > >>>> > > > > > >>> > > > > > >>> Hello > > > > > >>> > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > >>> > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > >>> > > > > > >>> The following small code show the problem: > > > > > >>> > > > > > >>> dma_addr_t dma; > > > > > >>> u8 *buf; > > > > > >>> #define BSIZE 2048 > > > > > >>> #define DMASIZE 16 > > > > > >>> > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > >>> buf[i] = 0xFE; > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > >> > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > Regards, > > > > > Samuel > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > for the testing. (All cache block-aligned data from the device for the > > > CPU should be invalided.) > > > > > > > With: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index 2c124bcc1932..608483522e05 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > The crypto self test works and I got no more buffer corruption. > No, No ... it's not a solution. That means your driver has a problem. > From device, we only need INVAL enough. > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. Example: crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. My driver write normally via DMA the first 16 bytes. Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) So the problem is the dma_map(FROM_DEVICE) which change buffer content. So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-17 17:35 ` Corentin Labbe 0 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-17 17:35 UTC (permalink / raw) To: Guo Ren Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > <clabbe.montjoie@gmail.com> wrote: > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > >>>> > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > >>>> > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > >>>> different set of cache instructions. But while they are different, > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > >>>> > > > > > >>>> > > > > > >>> > > > > > >>> Hello > > > > > >>> > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > >>> > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > >>> > > > > > >>> The following small code show the problem: > > > > > >>> > > > > > >>> dma_addr_t dma; > > > > > >>> u8 *buf; > > > > > >>> #define BSIZE 2048 > > > > > >>> #define DMASIZE 16 > > > > > >>> > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > >>> buf[i] = 0xFE; > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > >> > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > Regards, > > > > > Samuel > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > for the testing. (All cache block-aligned data from the device for the > > > CPU should be invalided.) > > > > > > > With: > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > index 2c124bcc1932..608483522e05 100644 > > --- a/arch/riscv/mm/dma-noncoherent.c > > +++ b/arch/riscv/mm/dma-noncoherent.c > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > break; > > case DMA_FROM_DEVICE: > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > break; > > case DMA_BIDIRECTIONAL: > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > The crypto self test works and I got no more buffer corruption. > No, No ... it's not a solution. That means your driver has a problem. > From device, we only need INVAL enough. > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. Example: crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. My driver write normally via DMA the first 16 bytes. Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) So the problem is the dma_map(FROM_DEVICE) which change buffer content. So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-17 17:35 ` Corentin Labbe @ 2022-04-17 22:50 ` Guo Ren -1 siblings, 0 replies; 50+ messages in thread From: Guo Ren @ 2022-04-17 22:50 UTC (permalink / raw) To: Corentin Labbe Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > <clabbe.montjoie@gmail.com> wrote: > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > >>>> > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > >>>> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > >>> Hello > > > > > > >>> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > >>> > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > >>> > > > > > > >>> The following small code show the problem: > > > > > > >>> > > > > > > >>> dma_addr_t dma; > > > > > > >>> u8 *buf; > > > > > > >>> #define BSIZE 2048 > > > > > > >>> #define DMASIZE 16 > > > > > > >>> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > >>> buf[i] = 0xFE; > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > >> > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > Regards, > > > > > > Samuel > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > for the testing. (All cache block-aligned data from the device for the > > > > CPU should be invalided.) > > > > > > > > > > With: > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > index 2c124bcc1932..608483522e05 100644 > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_FROM_DEVICE: > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_BIDIRECTIONAL: > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > No, No ... it's not a solution. That means your driver has a problem. > > From device, we only need INVAL enough. > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > Example: > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. > My driver write normally via DMA the first 16 bytes. > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. > > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) > > So the problem is the dma_map(FROM_DEVICE) which change buffer content. > > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? Actually, FLUSH is safe for all, but more expensive. Can you tell me which arm SOC are you using? And which version of linux is running on your arm SOC? -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/ ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-17 22:50 ` Guo Ren 0 siblings, 0 replies; 50+ messages in thread From: Guo Ren @ 2022-04-17 22:50 UTC (permalink / raw) To: Corentin Labbe Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > <clabbe.montjoie@gmail.com> wrote: > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > >>>> > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > >>>> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > >>> Hello > > > > > > >>> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > >>> > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > >>> > > > > > > >>> The following small code show the problem: > > > > > > >>> > > > > > > >>> dma_addr_t dma; > > > > > > >>> u8 *buf; > > > > > > >>> #define BSIZE 2048 > > > > > > >>> #define DMASIZE 16 > > > > > > >>> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > >>> buf[i] = 0xFE; > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > >> > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > Regards, > > > > > > Samuel > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > for the testing. (All cache block-aligned data from the device for the > > > > CPU should be invalided.) > > > > > > > > > > With: > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > index 2c124bcc1932..608483522e05 100644 > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_FROM_DEVICE: > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_BIDIRECTIONAL: > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > No, No ... it's not a solution. That means your driver has a problem. > > From device, we only need INVAL enough. > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > Example: > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. > My driver write normally via DMA the first 16 bytes. > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. > > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) > > So the problem is the dma_map(FROM_DEVICE) which change buffer content. > > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? Actually, FLUSH is safe for all, but more expensive. Can you tell me which arm SOC are you using? And which version of linux is running on your arm SOC? -- Best Regards Guo Ren ML: https://lore.kernel.org/linux-csky/ _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-17 22:50 ` Guo Ren @ 2022-04-19 7:44 ` Corentin Labbe -1 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-19 7:44 UTC (permalink / raw) To: Guo Ren Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto Le Mon, Apr 18, 2022 at 06:50:57AM +0800, Guo Ren a écrit : > On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe > <clabbe.montjoie@gmail.com> wrote: > > > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > > >>>> > > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > > >>>> > > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>> > > > > > > > >>> Hello > > > > > > > >>> > > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > > >>> > > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > > >>> > > > > > > > >>> The following small code show the problem: > > > > > > > >>> > > > > > > > >>> dma_addr_t dma; > > > > > > > >>> u8 *buf; > > > > > > > >>> #define BSIZE 2048 > > > > > > > >>> #define DMASIZE 16 > > > > > > > >>> > > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > > >>> buf[i] = 0xFE; > > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > > >> > > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > > > Regards, > > > > > > > Samuel > > > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > > for the testing. (All cache block-aligned data from the device for the > > > > > CPU should be invalided.) > > > > > > > > > > > > > With: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > > index 2c124bcc1932..608483522e05 100644 > > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_FROM_DEVICE: > > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_BIDIRECTIONAL: > > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > > No, No ... it's not a solution. That means your driver has a problem. > > > From device, we only need INVAL enough. > > > > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > > > Example: > > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. > > My driver write normally via DMA the first 16 bytes. > > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. > > > > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) > > > > So the problem is the dma_map(FROM_DEVICE) which change buffer content. > > > > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? > Actually, FLUSH is safe for all, but more expensive. Can you tell me > which arm SOC are you using? And which version of linux is running on > your arm SOC? > The SOC is Allwinner D1 (RiscV). I am testing linux from https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-19 7:44 ` Corentin Labbe 0 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-19 7:44 UTC (permalink / raw) To: Guo Ren Cc: Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Philipp Tomsich, Herbert Xu, linux-crypto Le Mon, Apr 18, 2022 at 06:50:57AM +0800, Guo Ren a écrit : > On Mon, Apr 18, 2022 at 1:35 AM Corentin Labbe > <clabbe.montjoie@gmail.com> wrote: > > > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > > >>>> > > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > > >>>> > > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>> > > > > > > > >>> Hello > > > > > > > >>> > > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > > >>> > > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > > >>> > > > > > > > >>> The following small code show the problem: > > > > > > > >>> > > > > > > > >>> dma_addr_t dma; > > > > > > > >>> u8 *buf; > > > > > > > >>> #define BSIZE 2048 > > > > > > > >>> #define DMASIZE 16 > > > > > > > >>> > > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > > >>> buf[i] = 0xFE; > > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > > >> > > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > > > Regards, > > > > > > > Samuel > > > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > > for the testing. (All cache block-aligned data from the device for the > > > > > CPU should be invalided.) > > > > > > > > > > > > > With: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > > index 2c124bcc1932..608483522e05 100644 > > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_FROM_DEVICE: > > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_BIDIRECTIONAL: > > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > > No, No ... it's not a solution. That means your driver has a problem. > > > From device, we only need INVAL enough. > > > > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > > > Example: > > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. > > My driver write normally via DMA the first 16 bytes. > > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. > > > > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) > > > > So the problem is the dma_map(FROM_DEVICE) which change buffer content. > > > > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? > Actually, FLUSH is safe for all, but more expensive. Can you tell me > which arm SOC are you using? And which version of linux is running on > your arm SOC? > The SOC is Allwinner D1 (RiscV). I am testing linux from https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-17 17:35 ` Corentin Labbe @ 2022-04-18 15:29 ` Philipp Tomsich -1 siblings, 0 replies; 50+ messages in thread From: Philipp Tomsich @ 2022-04-18 15:29 UTC (permalink / raw) To: Corentin Labbe Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Herbert Xu, linux-crypto On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > <clabbe.montjoie@gmail.com> wrote: > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > >>>> > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > >>>> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > >>> Hello > > > > > > >>> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > >>> > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > >>> > > > > > > >>> The following small code show the problem: > > > > > > >>> > > > > > > >>> dma_addr_t dma; > > > > > > >>> u8 *buf; > > > > > > >>> #define BSIZE 2048 > > > > > > >>> #define DMASIZE 16 > > > > > > >>> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > >>> buf[i] = 0xFE; > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > >> > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > Regards, > > > > > > Samuel > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > for the testing. (All cache block-aligned data from the device for the > > > > CPU should be invalided.) > > > > > > > > > > With: > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > index 2c124bcc1932..608483522e05 100644 > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_FROM_DEVICE: > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_BIDIRECTIONAL: > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > No, No ... it's not a solution. That means your driver has a problem. > > From device, we only need INVAL enough. > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > Example: > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes from the CPU to the buffer (and that any modifications to the underlying cache line can be dropped via an invalidation)? In other words: does the behavior change when mapping as DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it is first mapped as DMA_TO_DEVICE when poisoning the buffer and later as DMA_FROM_DEVICE when in normal operation? Philipp. > My driver write normally via DMA the first 16 bytes. > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. > > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) > > So the problem is the dma_map(FROM_DEVICE) which change buffer content. > > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-18 15:29 ` Philipp Tomsich 0 siblings, 0 replies; 50+ messages in thread From: Philipp Tomsich @ 2022-04-18 15:29 UTC (permalink / raw) To: Corentin Labbe Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Herbert Xu, linux-crypto On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > <clabbe.montjoie@gmail.com> wrote: > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > >>>> > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > >>>> > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > >>>> > > > > > > >>>> > > > > > > >>> > > > > > > >>> Hello > > > > > > >>> > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > >>> > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > >>> > > > > > > >>> The following small code show the problem: > > > > > > >>> > > > > > > >>> dma_addr_t dma; > > > > > > >>> u8 *buf; > > > > > > >>> #define BSIZE 2048 > > > > > > >>> #define DMASIZE 16 > > > > > > >>> > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > >>> buf[i] = 0xFE; > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > >> > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > Regards, > > > > > > Samuel > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > for the testing. (All cache block-aligned data from the device for the > > > > CPU should be invalided.) > > > > > > > > > > With: > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > index 2c124bcc1932..608483522e05 100644 > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_FROM_DEVICE: > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > break; > > > case DMA_BIDIRECTIONAL: > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > No, No ... it's not a solution. That means your driver has a problem. > > From device, we only need INVAL enough. > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > Example: > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes from the CPU to the buffer (and that any modifications to the underlying cache line can be dropped via an invalidation)? In other words: does the behavior change when mapping as DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it is first mapped as DMA_TO_DEVICE when poisoning the buffer and later as DMA_FROM_DEVICE when in normal operation? Philipp. > My driver write normally via DMA the first 16 bytes. > Crypto API check the last bytes, no more 0xFE, so it fail believing my driver wrote beyond the first 16 bytes. > > But even If I disable my hardware operation, the buffer is still corrupted. (See my sample code which just do dma_map/dma_unmap) > > So the problem is the dma_map(FROM_DEVICE) which change buffer content. > > So if this behavour is normal on D1 SoC, how to fix the crypto self tests ? ^ permalink raw reply [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant 2022-04-18 15:29 ` Philipp Tomsich @ 2022-04-19 7:52 ` Corentin Labbe -1 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-19 7:52 UTC (permalink / raw) To: Philipp Tomsich Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Herbert Xu, linux-crypto Le Mon, Apr 18, 2022 at 05:29:10PM +0200, Philipp Tomsich a écrit : > On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > > >>>> > > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > > >>>> > > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>> > > > > > > > >>> Hello > > > > > > > >>> > > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > > >>> > > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > > >>> > > > > > > > >>> The following small code show the problem: > > > > > > > >>> > > > > > > > >>> dma_addr_t dma; > > > > > > > >>> u8 *buf; > > > > > > > >>> #define BSIZE 2048 > > > > > > > >>> #define DMASIZE 16 > > > > > > > >>> > > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > > >>> buf[i] = 0xFE; > > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > > >> > > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > > > Regards, > > > > > > > Samuel > > > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > > for the testing. (All cache block-aligned data from the device for the > > > > > CPU should be invalided.) > > > > > > > > > > > > > With: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > > index 2c124bcc1932..608483522e05 100644 > > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_FROM_DEVICE: > > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_BIDIRECTIONAL: > > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > > No, No ... it's not a solution. That means your driver has a problem. > > > From device, we only need INVAL enough. > > > > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > > > Example: > > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. > > Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes > from the CPU to the buffer (and that any modifications to the > underlying cache line can be dropped via an invalidation)? > In other words: does the behavior change when mapping as > DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it > is first mapped as DMA_TO_DEVICE when poisoning the buffer and later > as DMA_FROM_DEVICE when in normal operation? > There are no cpu writes after the dma_map(FROM_DEVICE). The buffer is initialized by the cryptoAPI before. Furtheremore, the buffer corrupted is next to the buffer being mapped. I verified the size of dma_map_sg() via some debug: sun8i-ce 3040000.crypto: sun8i_ce_cipher_prepare ecb(aes) cryptlen=16 dma_direct_map_sg:483 SG0 len=16 <- dma_map TO_DEVICE dma_direct_map_sg:483 SG0 len=16 <- dma_map FROM_DEVICE need:a47ca9dd e0df4c86 a070af6e 91710dec have:a47ca9dd e0df4c86 a070af6e 91710dec dump whole buffer: over:a47ca9dd e0df4c86 a070af6e 91710dec over:ec05e6f2 d542fb77 128b2059 5bf06986 < here we should have 0xFE alg: skcipher: ecb-aes-sun8i-ce encryption overran dst buffer on test vector 1, cfg=\"random: use_finup src_divs=[<reimport>100.0%@+1604]\" Note that I tried the following patch: diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 4948201065cc..c5b945974441 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -19,6 +19,7 @@ #include <crypto/aead.h> #include <crypto/hash.h> #include <crypto/skcipher.h> +#include <linux/cacheflush.h> #include <linux/err.h> #include <linux/fips.h> #include <linux/module.h> @@ -205,6 +206,7 @@ static void testmgr_free_buf(char *buf[XBUFSIZE]) static inline void testmgr_poison(void *addr, size_t len) { memset(addr, TESTMGR_POISON_BYTE, len); + flush_icache_range(addr, addr + len); } /* Is the memory region still fully poisoned? */ This patch fixes the problem, but I am not sure this is the rigth way. A DMA mapping operation corrupting buffer around seems not good. ^ permalink raw reply related [flat|nested] 50+ messages in thread
* Re: [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant @ 2022-04-19 7:52 ` Corentin Labbe 0 siblings, 0 replies; 50+ messages in thread From: Corentin Labbe @ 2022-04-19 7:52 UTC (permalink / raw) To: Philipp Tomsich Cc: Guo Ren, Samuel Holland, Heiko Stuebner, Palmer Dabbelt, Paul Walmsley, linux-riscv, Linux Kernel Mailing List, Wei Fu, Atish Patra, Anup Patel, Nick Kossifidis, Christoph Muellner, Herbert Xu, linux-crypto Le Mon, Apr 18, 2022 at 05:29:10PM +0200, Philipp Tomsich a écrit : > On Sun, 17 Apr 2022 at 19:35, Corentin Labbe <clabbe.montjoie@gmail.com> wrote: > > > > Le Sun, Apr 17, 2022 at 04:49:34PM +0800, Guo Ren a écrit : > > > On Sun, Apr 17, 2022 at 4:45 PM Corentin Labbe > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > Le Sun, Apr 17, 2022 at 10:17:34AM +0800, Guo Ren a écrit : > > > > > On Sun, Apr 17, 2022 at 3:32 AM Corentin Labbe > > > > > <clabbe.montjoie@gmail.com> wrote: > > > > > > > > > > > > Le Sat, Apr 16, 2022 at 12:47:29PM -0500, Samuel Holland a écrit : > > > > > > > On 4/16/22 2:35 AM, Corentin Labbe wrote: > > > > > > > > Le Fri, Apr 15, 2022 at 09:19:23PM -0500, Samuel Holland a écrit : > > > > > > > >> On 4/15/22 6:26 AM, Corentin Labbe wrote: > > > > > > > >>> Le Mon, Mar 07, 2022 at 11:46:18PM +0100, Heiko Stuebner a écrit : > > > > > > > >>>> This series is based on the alternatives changes done in my svpbmt series > > > > > > > >>>> and thus also depends on Atish's isa-extension parsing series. > > > > > > > >>>> > > > > > > > >>>> It implements using the cache-management instructions from the Zicbom- > > > > > > > >>>> extension to handle cache flush, etc actions on platforms needing them. > > > > > > > >>>> > > > > > > > >>>> SoCs using cpu cores from T-Head like the Allwinne D1 implement a > > > > > > > >>>> different set of cache instructions. But while they are different, > > > > > > > >>>> instructions they provide the same functionality, so a variant can > > > > > > > >>>> easly hook into the existing alternatives mechanism on those. > > > > > > > >>>> > > > > > > > >>>> > > > > > > > >>> > > > > > > > >>> Hello > > > > > > > >>> > > > > > > > >>> I am testing https://github.com/smaeul/linux.git branch:origin/riscv/d1-wip which contain this serie. > > > > > > > >>> > > > > > > > >>> I am hitting a buffer corruption problem with DMA. > > > > > > > >>> The sun8i-ce crypto driver fail self tests due to "device overran destination buffer". > > > > > > > >>> In fact the buffer is not overran by device but by dma_map_single() operation. > > > > > > > >>> > > > > > > > >>> The following small code show the problem: > > > > > > > >>> > > > > > > > >>> dma_addr_t dma; > > > > > > > >>> u8 *buf; > > > > > > > >>> #define BSIZE 2048 > > > > > > > >>> #define DMASIZE 16 > > > > > > > >>> > > > > > > > >>> buf = kmalloc(BSIZE, GFP_KERNEL | GFP_DMA); > > > > > > > >>> for (i = 0; i < BSIZE; i++) > > > > > > > >>> buf[i] = 0xFE; > > > > > > > >>> print_hex_dump(KERN_INFO, "DMATEST1:", DUMP_PREFIX_NONE, 16, 4, buf, 256, false); > > > > > > > >>> dma = dma_map_single(ce->dev, buf, DMASIZE, DMA_FROM_DEVICE); > > > > > > > >> > > > > > > > >> This function (through dma_direct_map_page()) ends up calling > > > > > > > >> arch_sync_dma_for_device(..., ..., DMA_FROM_DEVICE), which invalidates the CPU's > > > > > > > >> cache. This is the same thing other architectures do (at least arm, arm64, > > > > > > > >> openrisc, and powerpc). So this appears to be working as intended. > > > > > > > > > > > > > > > > This behavour is not present at least on ARM and ARM64. > > > > > > > > The sample code I provided does not corrupt the buffer on them. > > > > > > > > > > > > > > That can be explained by the 0xFE bytes having been flushed to DRAM already in > > > > > > > your ARM/ARM64 tests, whereas in your riscv64 case, the 0xFE bytes were still in > > > > > > > a dirty cache line. The cache topology and implementation is totally different > > > > > > > across the SoCs, so this is not too surprising. > > > > > > > > > > > > > > Semantically, dma_map_single(..., DMA_FROM_DEVICE) means you are doing a > > > > > > > unidirectional DMA transfer from the device into that buffer. So the contents of > > > > > > > the buffer are "undefined" until the DMA transfer completes. If you are also > > > > > > > writing data into the buffer from the CPU side, then you need DMA_BIDIRECTIONAL. > > > > > > > > > > > > > > Regards, > > > > > > > Samuel > > > > > > > > > > > > +CC crypto mailing list + maintainer > > > > > > > > > > > > My problem is that crypto selftest, for each buffer where I need to do a cipher operation, > > > > > > concat a poison buffer to check that device does write beyond buffer. > > > > > > > > > > > > But the dma_map_sg(FROM_DEVICE) corrupts this poison buffer and crypto selftests fails thinking my device did a buffer overrun. > > > > > > > > > > > > So you mean that on SoC D1, this crypto API check strategy is impossible ? > > > > > > > > > > I think you could try to replace all CLEAN & INVAL ops with FLUSH ops > > > > > for the testing. (All cache block-aligned data from the device for the > > > > > CPU should be invalided.) > > > > > > > > > > > > > With: > > > > diff --git a/arch/riscv/mm/dma-noncoherent.c b/arch/riscv/mm/dma-noncoherent.c > > > > index 2c124bcc1932..608483522e05 100644 > > > > --- a/arch/riscv/mm/dma-noncoherent.c > > > > +++ b/arch/riscv/mm/dma-noncoherent.c > > > > @@ -21,7 +21,7 @@ void arch_sync_dma_for_device(phys_addr_t paddr, size_t size, enum dma_data_dire > > > > ALT_CMO_OP(CLEAN, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_FROM_DEVICE: > > > > - ALT_CMO_OP(INVAL, (unsigned long)phys_to_virt(paddr), size); > > > > + ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > break; > > > > case DMA_BIDIRECTIONAL: > > > > ALT_CMO_OP(FLUSH, (unsigned long)phys_to_virt(paddr), size); > > > > > > > > > > > > The crypto self test works and I got no more buffer corruption. > > > No, No ... it's not a solution. That means your driver has a problem. > > > From device, we only need INVAL enough. > > > > > > > For me, my driver works fine, the problem came from dma_map_sg(), probably I didnt explain right, I restart. > > > > Example: > > crypto self test send to my driver an AES cipher operation of 16 bytes inside a SG, but the original buffer is greater (said 32 for the example). > > So the first 16 bytes are used by the SG and the last 16 bytes are a poisoned buffer (with value 0xFE) to check driver do not write beyong the normal operation of 16 bytes (and beyond the SG length). > > > > Doing the dma_map_sg(FROM_DEVICE) on the SG corrupt the whole buffer. > > Doesn't the DMA_FROM_DEVICE indicate that there are no expected writes > from the CPU to the buffer (and that any modifications to the > underlying cache line can be dropped via an invalidation)? > In other words: does the behavior change when mapping as > DMA_BIDIRECTIONAL — and: should a map/unmap sequence be used where it > is first mapped as DMA_TO_DEVICE when poisoning the buffer and later > as DMA_FROM_DEVICE when in normal operation? > There are no cpu writes after the dma_map(FROM_DEVICE). The buffer is initialized by the cryptoAPI before. Furtheremore, the buffer corrupted is next to the buffer being mapped. I verified the size of dma_map_sg() via some debug: sun8i-ce 3040000.crypto: sun8i_ce_cipher_prepare ecb(aes) cryptlen=16 dma_direct_map_sg:483 SG0 len=16 <- dma_map TO_DEVICE dma_direct_map_sg:483 SG0 len=16 <- dma_map FROM_DEVICE need:a47ca9dd e0df4c86 a070af6e 91710dec have:a47ca9dd e0df4c86 a070af6e 91710dec dump whole buffer: over:a47ca9dd e0df4c86 a070af6e 91710dec over:ec05e6f2 d542fb77 128b2059 5bf06986 < here we should have 0xFE alg: skcipher: ecb-aes-sun8i-ce encryption overran dst buffer on test vector 1, cfg=\"random: use_finup src_divs=[<reimport>100.0%@+1604]\" Note that I tried the following patch: diff --git a/crypto/testmgr.c b/crypto/testmgr.c index 4948201065cc..c5b945974441 100644 --- a/crypto/testmgr.c +++ b/crypto/testmgr.c @@ -19,6 +19,7 @@ #include <crypto/aead.h> #include <crypto/hash.h> #include <crypto/skcipher.h> +#include <linux/cacheflush.h> #include <linux/err.h> #include <linux/fips.h> #include <linux/module.h> @@ -205,6 +206,7 @@ static void testmgr_free_buf(char *buf[XBUFSIZE]) static inline void testmgr_poison(void *addr, size_t len) { memset(addr, TESTMGR_POISON_BYTE, len); + flush_icache_range(addr, addr + len); } /* Is the memory region still fully poisoned? */ This patch fixes the problem, but I am not sure this is the rigth way. A DMA mapping operation corrupting buffer around seems not good. _______________________________________________ linux-riscv mailing list linux-riscv@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-riscv ^ permalink raw reply related [flat|nested] 50+ messages in thread
end of thread, other threads:[~2022-04-20 0:19 UTC | newest] Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-03-07 22:46 [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant Heiko Stuebner 2022-03-07 22:46 ` Heiko Stuebner 2022-03-07 22:46 ` [PATCH 1/2] riscv: Implement Zicbom-based cache management operations Heiko Stuebner 2022-03-07 22:46 ` Heiko Stuebner 2022-03-25 16:20 ` Anup Patel 2022-03-25 16:20 ` Anup Patel 2022-03-25 17:24 ` Philipp Tomsich 2022-03-25 17:24 ` Philipp Tomsich [not found] ` <CAAeLtUAi+61Hk7oBW979QEKYaume3vqdt_KkS_mXpRAs+CzHnA@mail.gmail.com> 2022-03-25 17:37 ` Anup Patel 2022-03-25 17:37 ` Anup Patel 2022-03-31 10:07 ` Christoph Hellwig 2022-03-31 10:07 ` Christoph Hellwig 2022-03-07 22:46 ` [PATCH 2/2] riscv: implement cache-management errata for T-Head SoCs Heiko Stuebner 2022-03-07 22:46 ` Heiko Stuebner 2022-03-31 2:30 ` Palmer Dabbelt 2022-03-31 2:30 ` Palmer Dabbelt 2022-03-31 8:22 ` Heiko Stübner 2022-03-31 8:22 ` Heiko Stübner 2022-03-31 8:29 ` Philipp Tomsich 2022-03-31 8:29 ` Philipp Tomsich 2022-04-20 0:18 ` Palmer Dabbelt 2022-04-20 0:18 ` Palmer Dabbelt 2022-04-01 1:05 ` Samuel Holland 2022-04-01 1:05 ` Samuel Holland 2022-04-15 11:26 ` [PATCH 0/2] riscv: implement Zicbom-based CMO instructions + the t-head variant Corentin Labbe 2022-04-15 11:26 ` Corentin Labbe 2022-04-16 2:19 ` Samuel Holland 2022-04-16 2:19 ` Samuel Holland 2022-04-16 7:35 ` Corentin Labbe 2022-04-16 7:35 ` Corentin Labbe 2022-04-16 17:47 ` Samuel Holland 2022-04-16 17:47 ` Samuel Holland 2022-04-16 19:32 ` Corentin Labbe 2022-04-16 19:32 ` Corentin Labbe 2022-04-17 2:17 ` Guo Ren 2022-04-17 2:17 ` Guo Ren 2022-04-17 8:45 ` Corentin Labbe 2022-04-17 8:45 ` Corentin Labbe 2022-04-17 8:49 ` Guo Ren 2022-04-17 8:49 ` Guo Ren 2022-04-17 17:35 ` Corentin Labbe 2022-04-17 17:35 ` Corentin Labbe 2022-04-17 22:50 ` Guo Ren 2022-04-17 22:50 ` Guo Ren 2022-04-19 7:44 ` Corentin Labbe 2022-04-19 7:44 ` Corentin Labbe 2022-04-18 15:29 ` Philipp Tomsich 2022-04-18 15:29 ` Philipp Tomsich 2022-04-19 7:52 ` Corentin Labbe 2022-04-19 7:52 ` Corentin Labbe
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.