linux-csky.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem
@ 2019-07-30 12:15 guoren
  2019-07-30 12:15 ` [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute guoren
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: guoren @ 2019-07-30 12:15 UTC (permalink / raw)
  To: arnd
  Cc: linux-kernel, linux-arch, linux-csky, feng_shizhu, zhang_jian5,
	zheng_xingjian, zhu_peng, Guo Ren

From: Guo Ren <ren_guo@c-sky.com>

If arch didn't define dma_r/wmb(), linux will use w/rmb instead. Csky
use bar.xxx to implement mb() and that will cause problem when sync data
with dma device, becasue bar.xxx couldn't guarantee bus transactions
finished at outside bus level. We must use sync.s instead of bar.xxx for
dma data synchronization and it will guarantee retirement after getting
the bus bresponse.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
---
 arch/csky/include/asm/barrier.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/csky/include/asm/barrier.h b/arch/csky/include/asm/barrier.h
index 476eb78..061a633 100644
--- a/arch/csky/include/asm/barrier.h
+++ b/arch/csky/include/asm/barrier.h
@@ -9,7 +9,9 @@
 #define nop()	asm volatile ("nop\n":::"memory")
 
 /*
- * sync:        completion barrier
+ * sync:        completion barrier, all sync.xx instructions
+ *              guarantee the last response recieved by bus transaction
+ *              made by ld/st instructions before sync.s
  * sync.s:      completion barrier and shareable to other cores
  * sync.i:      completion barrier with flush cpu pipeline
  * sync.is:     completion barrier with flush cpu pipeline and shareable to
@@ -31,6 +33,9 @@
 #define rmb()		asm volatile ("bar.brar\n":::"memory")
 #define wmb()		asm volatile ("bar.bwaw\n":::"memory")
 
+#define dma_rmb()	asm volatile ("sync.s\n":::"memory")
+#define dma_wmb()	asm volatile ("sync.s\n":::"memory")
+
 #ifdef CONFIG_SMP
 #define __smp_mb()	asm volatile ("bar.brwarws\n":::"memory")
 #define __smp_rmb()	asm volatile ("bar.brars\n":::"memory")
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute
  2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
@ 2019-07-30 12:15 ` guoren
  2019-07-30 12:15 ` [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs guoren
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: guoren @ 2019-07-30 12:15 UTC (permalink / raw)
  To: arnd
  Cc: linux-kernel, linux-arch, linux-csky, feng_shizhu, zhang_jian5,
	zheng_xingjian, zhu_peng, Guo Ren

From: Guo Ren <ren_guo@c-sky.com>

This bug is from commit: 2b070ccdf8c0 (fixup abiv2 mmap(... O_SYNC)
failed). In that patch we remove the _PAGE_SO for memory noncache
mapping and this will cause problem when drivers use dma descriptors
to control the transcations without dma_w/rmb().

After referencing other archs' implementation, pgprot_writecombine is
introduced for mmap(... O_SYNC).

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
---
 arch/csky/include/asm/pgtable.h | 10 ++++++++++
 arch/csky/mm/ioremap.c          |  6 ++----
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index c429a6f..fc19ba4 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -258,6 +258,16 @@ static inline pgprot_t pgprot_noncached(pgprot_t _prot)
 {
 	unsigned long prot = pgprot_val(_prot);
 
+	prot = (prot & ~_CACHE_MASK) | _CACHE_UNCACHED | _PAGE_SO;
+
+	return __pgprot(prot);
+}
+
+#define pgprot_writecombine pgprot_writecombine
+static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
+{
+	unsigned long prot = pgprot_val(_prot);
+
 	prot = (prot & ~_CACHE_MASK) | _CACHE_UNCACHED;
 
 	return __pgprot(prot);
diff --git a/arch/csky/mm/ioremap.c b/arch/csky/mm/ioremap.c
index 8473b6b..4853111 100644
--- a/arch/csky/mm/ioremap.c
+++ b/arch/csky/mm/ioremap.c
@@ -29,8 +29,7 @@ void __iomem *ioremap(phys_addr_t addr, size_t size)
 
 	vaddr = (unsigned long)area->addr;
 
-	prot = __pgprot(_PAGE_PRESENT | __READABLE | __WRITEABLE |
-			_PAGE_GLOBAL | _CACHE_UNCACHED | _PAGE_SO);
+	prot = pgprot_noncached(PAGE_KERNEL);
 
 	if (ioremap_page_range(vaddr, vaddr + size, addr, prot)) {
 		free_vm_area(area);
@@ -51,10 +50,9 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
 			      unsigned long size, pgprot_t vma_prot)
 {
 	if (!pfn_valid(pfn)) {
-		vma_prot.pgprot |= _PAGE_SO;
 		return pgprot_noncached(vma_prot);
 	} else if (file->f_flags & O_SYNC) {
-		return pgprot_noncached(vma_prot);
+		return pgprot_writecombine(vma_prot);
 	}
 
 	return vma_prot;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
  2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
  2019-07-30 12:15 ` [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute guoren
@ 2019-07-30 12:15 ` guoren
  2019-08-06  6:49   ` Christoph Hellwig
  2019-07-30 12:15 ` [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE guoren
  2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
  3 siblings, 1 reply; 15+ messages in thread
From: guoren @ 2019-07-30 12:15 UTC (permalink / raw)
  To: arnd
  Cc: linux-kernel, linux-arch, linux-csky, feng_shizhu, zhang_jian5,
	zheng_xingjian, zhu_peng, Guo Ren

From: Guo Ren <ren_guo@c-sky.com>

If the paddr and size are cross between NORMAL_ZONE and HIGHMEM_ZONE
memory range, cache_op will panic in do_page_fault with bad_area.

Optimize the code to support the range which cross memory ZONEs.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
---
 arch/csky/mm/dma-mapping.c | 73 +++++++++++++++++-----------------------------
 1 file changed, 27 insertions(+), 46 deletions(-)

diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
index 80783bb..3f1ff9d 100644
--- a/arch/csky/mm/dma-mapping.c
+++ b/arch/csky/mm/dma-mapping.c
@@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
 {
 	return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
 }
-postcore_initcall(atomic_pool_init);
-
-void arch_dma_prep_coherent(struct page *page, size_t size)
-{
-	if (PageHighMem(page)) {
-		unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-
-		do {
-			void *ptr = kmap_atomic(page);
-			size_t _size = (size < PAGE_SIZE) ? size : PAGE_SIZE;
-
-			memset(ptr, 0, _size);
-			dma_wbinv_range((unsigned long)ptr,
-					(unsigned long)ptr + _size);
-
-			kunmap_atomic(ptr);
-
-			page++;
-			size -= PAGE_SIZE;
-			count--;
-		} while (count);
-	} else {
-		void *ptr = page_address(page);
-
-		memset(ptr, 0, size);
-		dma_wbinv_range((unsigned long)ptr, (unsigned long)ptr + size);
-	}
-}
+arch_initcall(atomic_pool_init);
 
 static inline void cache_op(phys_addr_t paddr, size_t size,
 			    void (*fn)(unsigned long start, unsigned long end))
 {
-	struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
-	unsigned int offset = paddr & ~PAGE_MASK;
-	size_t left = size;
-	unsigned long start;
+	struct page *page    = phys_to_page(paddr);
+	void *start          = __va(page_to_phys(page));
+	unsigned long offset = offset_in_page(paddr);
+	size_t left          = size;
 
 	do {
 		size_t len = left;
 
+		if (offset + len > PAGE_SIZE)
+			len = PAGE_SIZE - offset;
+
 		if (PageHighMem(page)) {
-			void *addr;
+			start = kmap_atomic(page);
 
-			if (offset + len > PAGE_SIZE) {
-				if (offset >= PAGE_SIZE) {
-					page += offset >> PAGE_SHIFT;
-					offset &= ~PAGE_MASK;
-				}
-				len = PAGE_SIZE - offset;
-			}
+			fn((unsigned long)start + offset,
+					(unsigned long)start + offset + len);
 
-			addr = kmap_atomic(page);
-			start = (unsigned long)(addr + offset);
-			fn(start, start + len);
-			kunmap_atomic(addr);
+			kunmap_atomic(start);
 		} else {
-			start = (unsigned long)phys_to_virt(paddr);
-			fn(start, start + size);
+			fn((unsigned long)start + offset,
+					(unsigned long)start + offset + len);
 		}
 		offset = 0;
+
 		page++;
+		start += PAGE_SIZE;
 		left -= len;
 	} while (left);
 }
 
+static void dma_wbinv_set_zero_range(unsigned long start, unsigned long end)
+{
+	memset((void *)start, 0, end - start);
+	dma_wbinv_range(start, end);
+}
+
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+	cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range);
+}
+
 void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
 			      size_t size, enum dma_data_direction dir)
 {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
  2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
  2019-07-30 12:15 ` [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute guoren
  2019-07-30 12:15 ` [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs guoren
@ 2019-07-30 12:15 ` guoren
  2019-07-30 13:43   ` Arnd Bergmann
  2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
  3 siblings, 1 reply; 15+ messages in thread
From: guoren @ 2019-07-30 12:15 UTC (permalink / raw)
  To: arnd
  Cc: linux-kernel, linux-arch, linux-csky, feng_shizhu, zhang_jian5,
	zheng_xingjian, zhu_peng, Guo Ren

From: Guo Ren <ren_guo@c-sky.com>

DMA_FROM_DEVICE only need to read dma data of memory into CPU cache,
so there is no need to clear cache before. Also clear + inv for
DMA_FROM_DEVICE won't cause problem, because the memory range for dma
won't be touched by software during dma working.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
---
 arch/csky/include/asm/cache.h |  1 +
 arch/csky/mm/cachev1.c        |  7 ++++++-
 arch/csky/mm/cachev2.c        | 11 ++++++++++-
 arch/csky/mm/dma-mapping.c    |  4 ++++
 4 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/arch/csky/include/asm/cache.h b/arch/csky/include/asm/cache.h
index d683734..1d5fc2f 100644
--- a/arch/csky/include/asm/cache.h
+++ b/arch/csky/include/asm/cache.h
@@ -24,6 +24,7 @@ void cache_wbinv_range(unsigned long start, unsigned long end);
 void cache_wbinv_all(void);
 
 void dma_wbinv_range(unsigned long start, unsigned long end);
+void dma_inv_range(unsigned long start, unsigned long end);
 void dma_wb_range(unsigned long start, unsigned long end);
 
 #endif
diff --git a/arch/csky/mm/cachev1.c b/arch/csky/mm/cachev1.c
index b8a75cc..494ec91 100644
--- a/arch/csky/mm/cachev1.c
+++ b/arch/csky/mm/cachev1.c
@@ -120,7 +120,12 @@ void dma_wbinv_range(unsigned long start, unsigned long end)
 	cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
 }
 
+void dma_inv_range(unsigned long start, unsigned long end)
+{
+	cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
+}
+
 void dma_wb_range(unsigned long start, unsigned long end)
 {
-	cache_op_range(start, end, DATA_CACHE|CACHE_INV, 1);
+	cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
 }
diff --git a/arch/csky/mm/cachev2.c b/arch/csky/mm/cachev2.c
index baaf05d..b61be65 100644
--- a/arch/csky/mm/cachev2.c
+++ b/arch/csky/mm/cachev2.c
@@ -69,11 +69,20 @@ void dma_wbinv_range(unsigned long start, unsigned long end)
 	sync_is();
 }
 
+void dma_inv_range(unsigned long start, unsigned long end)
+{
+	unsigned long i = start & ~(L1_CACHE_BYTES - 1);
+
+	for (; i < end; i += L1_CACHE_BYTES)
+		asm volatile("dcache.iva %0\n"::"r"(i):"memory");
+	sync_is();
+}
+
 void dma_wb_range(unsigned long start, unsigned long end)
 {
 	unsigned long i = start & ~(L1_CACHE_BYTES - 1);
 
 	for (; i < end; i += L1_CACHE_BYTES)
-		asm volatile("dcache.civa %0\n"::"r"(i):"memory");
+		asm volatile("dcache.cva %0\n"::"r"(i):"memory");
 	sync_is();
 }
diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
index 3f1ff9d..d8f0f81 100644
--- a/arch/csky/mm/dma-mapping.c
+++ b/arch/csky/mm/dma-mapping.c
@@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
 		cache_op(paddr, size, dma_wb_range);
 		break;
 	case DMA_FROM_DEVICE:
+		cache_op(paddr, size, dma_inv_range);
+		break;
 	case DMA_BIDIRECTIONAL:
 		cache_op(paddr, size, dma_wbinv_range);
 		break;
@@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
 		cache_op(paddr, size, dma_wb_range);
 		break;
 	case DMA_FROM_DEVICE:
+		cache_op(paddr, size, dma_inv_range);
+		break;
 	case DMA_BIDIRECTIONAL:
 		cache_op(paddr, size, dma_wbinv_range);
 		break;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem
  2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
                   ` (2 preceding siblings ...)
  2019-07-30 12:15 ` [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE guoren
@ 2019-07-30 13:29 ` Arnd Bergmann
  2019-07-30 15:15   ` Guo Ren
  2019-07-30 15:28   ` Guo Ren
  3 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-07-30 13:29 UTC (permalink / raw)
  To: Guo Ren
  Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

On Tue, Jul 30, 2019 at 2:15 PM <guoren@kernel.org> wrote:
>
> From: Guo Ren <ren_guo@c-sky.com>
>
> If arch didn't define dma_r/wmb(), linux will use w/rmb instead. Csky
> use bar.xxx to implement mb() and that will cause problem when sync data
> with dma device, becasue bar.xxx couldn't guarantee bus transactions
> finished at outside bus level. We must use sync.s instead of bar.xxx for
> dma data synchronization and it will guarantee retirement after getting
> the bus bresponse.
>
> Signed-off-by: Guo Ren <ren_guo@c-sky.com>

This change looks good to me, but I think your regular barriers
(mb, rmb, wmb) are still wrong: These are meant to be the superset
of dma_{r,w}mb and smp_{,r,w}mb, and synchronize
against both SMP and DMA interactions.

I suspect you can drop the '.s' for non-SMP builds. What I don't
see is whether you might need to add '.i' for dma_wmb() or
dma_rmb().

       Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
  2019-07-30 12:15 ` [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE guoren
@ 2019-07-30 13:43   ` Arnd Bergmann
  2019-07-30 15:11     ` Guo Ren
  2019-08-06  6:51     ` Christoph Hellwig
  0 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-07-30 13:43 UTC (permalink / raw)
  To: Guo Ren
  Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

On Tue, Jul 30, 2019 at 2:16 PM <guoren@kernel.org> wrote:
> From: Guo Ren <ren_guo@c-sky.com>

> diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> index 3f1ff9d..d8f0f81 100644
> --- a/arch/csky/mm/dma-mapping.c
> +++ b/arch/csky/mm/dma-mapping.c
> @@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
>                 cache_op(paddr, size, dma_wb_range);
>                 break;
>         case DMA_FROM_DEVICE:
> +               cache_op(paddr, size, dma_inv_range);
> +               break;
>         case DMA_BIDIRECTIONAL:
>                 cache_op(paddr, size, dma_wbinv_range);
>                 break;
> @@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
>                 cache_op(paddr, size, dma_wb_range);
>                 break;
>         case DMA_FROM_DEVICE:
> +               cache_op(paddr, size, dma_inv_range);
> +               break;
>         case DMA_BIDIRECTIONAL:
>                 cache_op(paddr, size, dma_wbinv_range);
>                 break;

When syncing 'for_cpu', you should not need to write back, because
there won't be any dirty cache lines.

If you have a CPU core that does not do speculative loads, you also don't
need to invalidate here, because you have already done that in the
_for_device() case, the only reason to invalidate the CPU cache
again is if a speculative load created a stale cache line that now
shadows the data received from the device.

        Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
  2019-07-30 13:43   ` Arnd Bergmann
@ 2019-07-30 15:11     ` Guo Ren
  2019-07-30 15:22       ` Arnd Bergmann
  2019-08-06  6:51     ` Christoph Hellwig
  1 sibling, 1 reply; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:11 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

Thx Arnd,

On Tue, Jul 30, 2019 at 9:43 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 2:16 PM <guoren@kernel.org> wrote:
> > From: Guo Ren <ren_guo@c-sky.com>
>
> > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > index 3f1ff9d..d8f0f81 100644
> > --- a/arch/csky/mm/dma-mapping.c
> > +++ b/arch/csky/mm/dma-mapping.c
> > @@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
> >                 cache_op(paddr, size, dma_wb_range);
> >                 break;
> >         case DMA_FROM_DEVICE:
> > +               cache_op(paddr, size, dma_inv_range);
> > +               break;
> >         case DMA_BIDIRECTIONAL:
> >                 cache_op(paddr, size, dma_wbinv_range);
> >                 break;
> > @@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
> >                 cache_op(paddr, size, dma_wb_range);
> >                 break;
> >         case DMA_FROM_DEVICE:
> > +               cache_op(paddr, size, dma_inv_range);
> > +               break;
> >         case DMA_BIDIRECTIONAL:
> >                 cache_op(paddr, size, dma_wbinv_range);
> >                 break;
>
> When syncing 'for_cpu', you should not need to write back, because
> there won't be any dirty cache lines.
I just follow the dma_data_direction param, seems dir param and
function are a little bit duplicated. And our cpu won't clear clean
cache line into memory, so dma_wb_page won't cause problem.
Seems arch_sync_dma_for_cpu with dir=DMA_TO_DEVICE is self-contradictory.

Do you want me modfiy like these:
@@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev,
phys_addr_t paddr,
        case DMA_TO_DEVICE:
        case DMA_FROM_DEVICE:
        case DMA_BIDIRECTIONAL:
               cache_op(paddr, size, dma_inv_range);
               break;

@@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev,
phys_addr_t paddr,
        case DMA_TO_DEVICE:
                cache_op(paddr, size, dma_wb_range);
                break;
        case DMA_FROM_DEVICE:
        case DMA_BIDIRECTIONAL:
                cache_op(paddr, size, dma_wbinv_range);
                break;

>
> If you have a CPU core that does not do speculative loads, you also don't
> need to invalidate here, because you have already done that in the
> _for_device() case, the only reason to invalidate the CPU cache
> again is if a speculative load created a stale cache line that now
> shadows the data received from the device.
Our CPU support speculative loads :)

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem
  2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
@ 2019-07-30 15:15   ` Guo Ren
  2019-07-30 15:28   ` Guo Ren
  1 sibling, 0 replies; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:15 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

On Tue, Jul 30, 2019 at 9:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 2:15 PM <guoren@kernel.org> wrote:
> >
> > From: Guo Ren <ren_guo@c-sky.com>
> >
> > If arch didn't define dma_r/wmb(), linux will use w/rmb instead. Csky
> > use bar.xxx to implement mb() and that will cause problem when sync data
> > with dma device, becasue bar.xxx couldn't guarantee bus transactions
> > finished at outside bus level. We must use sync.s instead of bar.xxx for
> > dma data synchronization and it will guarantee retirement after getting
> > the bus bresponse.
> >
> > Signed-off-by: Guo Ren <ren_guo@c-sky.com>
>
> This change looks good to me, but I think your regular barriers
> (mb, rmb, wmb) are still wrong: These are meant to be the superset
> of dma_{r,w}mb and smp_{,r,w}mb, and synchronize
> against both SMP and DMA interactions.
>
> I suspect you can drop the '.s' for non-SMP builds. What I don't
> see is whether you might need to add '.i' for dma_wmb() or
> dma_rmb().
>
>        Arnd



-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
  2019-07-30 15:11     ` Guo Ren
@ 2019-07-30 15:22       ` Arnd Bergmann
  2019-07-30 15:48         ` Guo Ren
  0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2019-07-30 15:22 UTC (permalink / raw)
  To: Guo Ren
  Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

On Tue, Jul 30, 2019 at 5:11 PM Guo Ren <guoren@kernel.org> wrote:
> > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > >                 cache_op(paddr, size, dma_wb_range);
> > >                 break;
> > >         case DMA_FROM_DEVICE:
> > > +               cache_op(paddr, size, dma_inv_range);
> > > +               break;
> > >         case DMA_BIDIRECTIONAL:
> > >                 cache_op(paddr, size, dma_wbinv_range);
> > >                 break;
> >
> > When syncing 'for_cpu', you should not need to write back, because
> > there won't be any dirty cache lines.
>
> I just follow the dma_data_direction param, seems dir param and
> function are a little bit duplicated. And our cpu won't clear clean
> cache line into memory, so dma_wb_page won't cause problem.
> Seems arch_sync_dma_for_cpu with dir=DMA_TO_DEVICE is
> self-contradictory.

Right, you generally don't need to do cache management for that
combination.

There might be other things to do here though, e.g. with a strict
iommu implementation one could tear down the i/o page table
entry to prevent the device from accessing a buffer while that is
owned by the cpu.

> Do you want me modfiy like these:
> @@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev,
> phys_addr_t paddr,
>         case DMA_TO_DEVICE:
>         case DMA_FROM_DEVICE:
>         case DMA_BIDIRECTIONAL:
>                cache_op(paddr, size, dma_inv_range);
>                break;
>
> @@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev,
> phys_addr_t paddr,
>         case DMA_TO_DEVICE:
>                 cache_op(paddr, size, dma_wb_range);
>                 break;
>         case DMA_FROM_DEVICE:
>         case DMA_BIDIRECTIONAL:
>                 cache_op(paddr, size, dma_wbinv_range);
>                 break;
>
> >
> > If you have a CPU core that does not do speculative loads, you also don't
> > need to invalidate here, because you have already done that in the
> > _for_device() case, the only reason to invalidate the CPU cache
> > again is if a speculative load created a stale cache line that now
> > shadows the data received from the device.
> Our CPU support speculative loads :)

Ok, then you both invalidations are indeed needed.
I was guessing that CK610 had no speculation.

       Arnd

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem
  2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
  2019-07-30 15:15   ` Guo Ren
@ 2019-07-30 15:28   ` Guo Ren
  1 sibling, 0 replies; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:28 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

Thx Arnd,

On Tue, Jul 30, 2019 at 9:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 2:15 PM <guoren@kernel.org> wrote:
> >
> > From: Guo Ren <ren_guo@c-sky.com>
> >
> > If arch didn't define dma_r/wmb(), linux will use w/rmb instead. Csky
> > use bar.xxx to implement mb() and that will cause problem when sync data
> > with dma device, becasue bar.xxx couldn't guarantee bus transactions
> > finished at outside bus level. We must use sync.s instead of bar.xxx for
> > dma data synchronization and it will guarantee retirement after getting
> > the bus bresponse.
> >
> > Signed-off-by: Guo Ren <ren_guo@c-sky.com>
>
> This change looks good to me, but I think your regular barriers
> (mb, rmb, wmb) are still wrong: These are meant to be the superset
> of dma_{r,w}mb and smp_{,r,w}mb, and synchronize
> against both SMP and DMA interactions.
Wow, thanks for correction, yes! mb, rmb, wmb is the superset of dma
and smp, and I should use sync.s for mb.

I check the arm64, it use dsb for mb and dmb for smp_mb and dma_mb.

I also check the drivers, you are right , a lot of them use mb() to
sync mem data with device.

>
> I suspect you can drop the '.s' for non-SMP builds. What I don't
> see is whether you might need to add '.i' for dma_wmb() or
> dma_rmb().
It's no need for sky, when there is non-SMP, the cpu is in reset mode,
and sync.s is equal to sync in efficient.

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
  2019-07-30 15:22       ` Arnd Bergmann
@ 2019-07-30 15:48         ` Guo Ren
  0 siblings, 0 replies; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:48 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

Thx Arnd,

On Tue, Jul 30, 2019 at 11:22 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 5:11 PM Guo Ren <guoren@kernel.org> wrote:
> > > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > > >                 cache_op(paddr, size, dma_wb_range);
> > > >                 break;
> > > >         case DMA_FROM_DEVICE:
> > > > +               cache_op(paddr, size, dma_inv_range);
> > > > +               break;
> > > >         case DMA_BIDIRECTIONAL:
> > > >                 cache_op(paddr, size, dma_wbinv_range);
> > > >                 break;
> > >
> > > When syncing 'for_cpu', you should not need to write back, because
> > > there won't be any dirty cache lines.
> >
> > I just follow the dma_data_direction param, seems dir param and
> > function are a little bit duplicated. And our cpu won't clear clean
> > cache line into memory, so dma_wb_page won't cause problem.
> > Seems arch_sync_dma_for_cpu with dir=DMA_TO_DEVICE is
> > self-contradictory.
>
> Right, you generally don't need to do cache management for that
> combination.
>
> There might be other things to do here though, e.g. with a strict
> iommu implementation one could tear down the i/o page table
> entry to prevent the device from accessing a buffer while that is
> owned by the cpu.
Tear down i/o page table shouldn't be put here and it's for map/unmap().
And I think arch_sync_dma_for_cpu/device should only do cache issues.

--
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
  2019-07-30 12:15 ` [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs guoren
@ 2019-08-06  6:49   ` Christoph Hellwig
  2019-08-06  7:11     ` Guo Ren
  0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2019-08-06  6:49 UTC (permalink / raw)
  To: guoren
  Cc: arnd, linux-kernel, linux-arch, linux-csky, feng_shizhu,
	zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

On Tue, Jul 30, 2019 at 08:15:44PM +0800, guoren@kernel.org wrote:
> diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> index 80783bb..3f1ff9d 100644
> --- a/arch/csky/mm/dma-mapping.c
> +++ b/arch/csky/mm/dma-mapping.c
> @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
>  {
>  	return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
>  }
> -postcore_initcall(atomic_pool_init);

Please keep the postcore_initcall next to the function it calls.

In this particular case I also plan to remove the function for 5.4 by
moving it to the common dma code, so it is more important than just style.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
  2019-07-30 13:43   ` Arnd Bergmann
  2019-07-30 15:11     ` Guo Ren
@ 2019-08-06  6:51     ` Christoph Hellwig
  1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-08-06  6:51 UTC (permalink / raw)
  To: Arnd Bergmann
  Cc: Guo Ren, Linux Kernel Mailing List, linux-arch, linux-csky,
	feng_shizhu, zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

On Tue, Jul 30, 2019 at 03:43:20PM +0200, Arnd Bergmann wrote:
> When syncing 'for_cpu', you should not need to write back, because
> there won't be any dirty cache lines.
> 
> If you have a CPU core that does not do speculative loads, you also don't
> need to invalidate here, because you have already done that in the
> _for_device() case, the only reason to invalidate the CPU cache
> again is if a speculative load created a stale cache line that now
> shadows the data received from the device.

Yes.  And that is one reason why I want to lift a set of common helpers
for both the speculating and non-speculating case to the common code
that just calls arch specific writeback/invalidate/writeback+invalidate
helpers.  It hasn't been a priotity so far, but maybe it becomes one
now.  Especially if I could draft someone else to help with it :)

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
  2019-08-06  6:49   ` Christoph Hellwig
@ 2019-08-06  7:11     ` Guo Ren
  2019-08-06  7:18       ` Christoph Hellwig
  0 siblings, 1 reply; 15+ messages in thread
From: Guo Ren @ 2019-08-06  7:11 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Arnd Bergmann, Linux Kernel Mailing List, linux-arch, linux-csky,
	feng_shizhu, zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren

On Tue, Aug 6, 2019 at 2:49 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jul 30, 2019 at 08:15:44PM +0800, guoren@kernel.org wrote:
> > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > index 80783bb..3f1ff9d 100644
> > --- a/arch/csky/mm/dma-mapping.c
> > +++ b/arch/csky/mm/dma-mapping.c
> > @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
> >  {
> >       return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
> >  }
> > -postcore_initcall(atomic_pool_init);
>
> Please keep the postcore_initcall next to the function it calls.
Ok. Change arch_initcall back to postcore_initcall. :)

-- 
Best Regards
 Guo Ren

ML: https://lore.kernel.org/linux-csky/

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
  2019-08-06  7:11     ` Guo Ren
@ 2019-08-06  7:18       ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-08-06  7:18 UTC (permalink / raw)
  To: Guo Ren
  Cc: Christoph Hellwig, Arnd Bergmann, Linux Kernel Mailing List,
	linux-arch, linux-csky, feng_shizhu, zhang_jian5, zheng_xingjian,
	zhu_peng, Guo Ren

On Tue, Aug 06, 2019 at 03:11:13PM +0800, Guo Ren wrote:
> On Tue, Aug 6, 2019 at 2:49 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Tue, Jul 30, 2019 at 08:15:44PM +0800, guoren@kernel.org wrote:
> > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > > index 80783bb..3f1ff9d 100644
> > > --- a/arch/csky/mm/dma-mapping.c
> > > +++ b/arch/csky/mm/dma-mapping.c
> > > @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
> > >  {
> > >       return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
> > >  }
> > > -postcore_initcall(atomic_pool_init);
> >
> > Please keep the postcore_initcall next to the function it calls.
> Ok. Change arch_initcall back to postcore_initcall. :)

Well, if you have a good reason to change it please keep the type
init level change, but put it in a separate patch.  But most importantly
don't move the place where is called around.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-06  7:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
2019-07-30 12:15 ` [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute guoren
2019-07-30 12:15 ` [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs guoren
2019-08-06  6:49   ` Christoph Hellwig
2019-08-06  7:11     ` Guo Ren
2019-08-06  7:18       ` Christoph Hellwig
2019-07-30 12:15 ` [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE guoren
2019-07-30 13:43   ` Arnd Bergmann
2019-07-30 15:11     ` Guo Ren
2019-07-30 15:22       ` Arnd Bergmann
2019-07-30 15:48         ` Guo Ren
2019-08-06  6:51     ` Christoph Hellwig
2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
2019-07-30 15:15   ` Guo Ren
2019-07-30 15:28   ` Guo Ren

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).