* [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute
2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
@ 2019-07-30 12:15 ` guoren
2019-07-30 12:15 ` [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs guoren
` (2 subsequent siblings)
3 siblings, 0 replies; 15+ messages in thread
From: guoren @ 2019-07-30 12:15 UTC (permalink / raw)
To: arnd
Cc: linux-kernel, linux-arch, linux-csky, feng_shizhu, zhang_jian5,
zheng_xingjian, zhu_peng, Guo Ren
From: Guo Ren <ren_guo@c-sky.com>
This bug is from commit: 2b070ccdf8c0 (fixup abiv2 mmap(... O_SYNC)
failed). In that patch we remove the _PAGE_SO for memory noncache
mapping and this will cause problem when drivers use dma descriptors
to control the transcations without dma_w/rmb().
After referencing other archs' implementation, pgprot_writecombine is
introduced for mmap(... O_SYNC).
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
---
arch/csky/include/asm/pgtable.h | 10 ++++++++++
arch/csky/mm/ioremap.c | 6 ++----
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/arch/csky/include/asm/pgtable.h b/arch/csky/include/asm/pgtable.h
index c429a6f..fc19ba4 100644
--- a/arch/csky/include/asm/pgtable.h
+++ b/arch/csky/include/asm/pgtable.h
@@ -258,6 +258,16 @@ static inline pgprot_t pgprot_noncached(pgprot_t _prot)
{
unsigned long prot = pgprot_val(_prot);
+ prot = (prot & ~_CACHE_MASK) | _CACHE_UNCACHED | _PAGE_SO;
+
+ return __pgprot(prot);
+}
+
+#define pgprot_writecombine pgprot_writecombine
+static inline pgprot_t pgprot_writecombine(pgprot_t _prot)
+{
+ unsigned long prot = pgprot_val(_prot);
+
prot = (prot & ~_CACHE_MASK) | _CACHE_UNCACHED;
return __pgprot(prot);
diff --git a/arch/csky/mm/ioremap.c b/arch/csky/mm/ioremap.c
index 8473b6b..4853111 100644
--- a/arch/csky/mm/ioremap.c
+++ b/arch/csky/mm/ioremap.c
@@ -29,8 +29,7 @@ void __iomem *ioremap(phys_addr_t addr, size_t size)
vaddr = (unsigned long)area->addr;
- prot = __pgprot(_PAGE_PRESENT | __READABLE | __WRITEABLE |
- _PAGE_GLOBAL | _CACHE_UNCACHED | _PAGE_SO);
+ prot = pgprot_noncached(PAGE_KERNEL);
if (ioremap_page_range(vaddr, vaddr + size, addr, prot)) {
free_vm_area(area);
@@ -51,10 +50,9 @@ pgprot_t phys_mem_access_prot(struct file *file, unsigned long pfn,
unsigned long size, pgprot_t vma_prot)
{
if (!pfn_valid(pfn)) {
- vma_prot.pgprot |= _PAGE_SO;
return pgprot_noncached(vma_prot);
} else if (file->f_flags & O_SYNC) {
- return pgprot_noncached(vma_prot);
+ return pgprot_writecombine(vma_prot);
}
return vma_prot;
--
2.7.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
2019-07-30 12:15 ` [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute guoren
@ 2019-07-30 12:15 ` guoren
2019-08-06 6:49 ` Christoph Hellwig
2019-07-30 12:15 ` [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE guoren
2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
3 siblings, 1 reply; 15+ messages in thread
From: guoren @ 2019-07-30 12:15 UTC (permalink / raw)
To: arnd
Cc: linux-kernel, linux-arch, linux-csky, feng_shizhu, zhang_jian5,
zheng_xingjian, zhu_peng, Guo Ren
From: Guo Ren <ren_guo@c-sky.com>
If the paddr and size are cross between NORMAL_ZONE and HIGHMEM_ZONE
memory range, cache_op will panic in do_page_fault with bad_area.
Optimize the code to support the range which cross memory ZONEs.
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
---
arch/csky/mm/dma-mapping.c | 73 +++++++++++++++++-----------------------------
1 file changed, 27 insertions(+), 46 deletions(-)
diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
index 80783bb..3f1ff9d 100644
--- a/arch/csky/mm/dma-mapping.c
+++ b/arch/csky/mm/dma-mapping.c
@@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
{
return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
}
-postcore_initcall(atomic_pool_init);
-
-void arch_dma_prep_coherent(struct page *page, size_t size)
-{
- if (PageHighMem(page)) {
- unsigned int count = PAGE_ALIGN(size) >> PAGE_SHIFT;
-
- do {
- void *ptr = kmap_atomic(page);
- size_t _size = (size < PAGE_SIZE) ? size : PAGE_SIZE;
-
- memset(ptr, 0, _size);
- dma_wbinv_range((unsigned long)ptr,
- (unsigned long)ptr + _size);
-
- kunmap_atomic(ptr);
-
- page++;
- size -= PAGE_SIZE;
- count--;
- } while (count);
- } else {
- void *ptr = page_address(page);
-
- memset(ptr, 0, size);
- dma_wbinv_range((unsigned long)ptr, (unsigned long)ptr + size);
- }
-}
+arch_initcall(atomic_pool_init);
static inline void cache_op(phys_addr_t paddr, size_t size,
void (*fn)(unsigned long start, unsigned long end))
{
- struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
- unsigned int offset = paddr & ~PAGE_MASK;
- size_t left = size;
- unsigned long start;
+ struct page *page = phys_to_page(paddr);
+ void *start = __va(page_to_phys(page));
+ unsigned long offset = offset_in_page(paddr);
+ size_t left = size;
do {
size_t len = left;
+ if (offset + len > PAGE_SIZE)
+ len = PAGE_SIZE - offset;
+
if (PageHighMem(page)) {
- void *addr;
+ start = kmap_atomic(page);
- if (offset + len > PAGE_SIZE) {
- if (offset >= PAGE_SIZE) {
- page += offset >> PAGE_SHIFT;
- offset &= ~PAGE_MASK;
- }
- len = PAGE_SIZE - offset;
- }
+ fn((unsigned long)start + offset,
+ (unsigned long)start + offset + len);
- addr = kmap_atomic(page);
- start = (unsigned long)(addr + offset);
- fn(start, start + len);
- kunmap_atomic(addr);
+ kunmap_atomic(start);
} else {
- start = (unsigned long)phys_to_virt(paddr);
- fn(start, start + size);
+ fn((unsigned long)start + offset,
+ (unsigned long)start + offset + len);
}
offset = 0;
+
page++;
+ start += PAGE_SIZE;
left -= len;
} while (left);
}
+static void dma_wbinv_set_zero_range(unsigned long start, unsigned long end)
+{
+ memset((void *)start, 0, end - start);
+ dma_wbinv_range(start, end);
+}
+
+void arch_dma_prep_coherent(struct page *page, size_t size)
+{
+ cache_op(page_to_phys(page), size, dma_wbinv_set_zero_range);
+}
+
void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
size_t size, enum dma_data_direction dir)
{
--
2.7.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
2019-07-30 12:15 ` [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs guoren
@ 2019-08-06 6:49 ` Christoph Hellwig
2019-08-06 7:11 ` Guo Ren
0 siblings, 1 reply; 15+ messages in thread
From: Christoph Hellwig @ 2019-08-06 6:49 UTC (permalink / raw)
To: guoren
Cc: arnd, linux-kernel, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
On Tue, Jul 30, 2019 at 08:15:44PM +0800, guoren@kernel.org wrote:
> diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> index 80783bb..3f1ff9d 100644
> --- a/arch/csky/mm/dma-mapping.c
> +++ b/arch/csky/mm/dma-mapping.c
> @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
> {
> return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
> }
> -postcore_initcall(atomic_pool_init);
Please keep the postcore_initcall next to the function it calls.
In this particular case I also plan to remove the function for 5.4 by
moving it to the common dma code, so it is more important than just style.
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
2019-08-06 6:49 ` Christoph Hellwig
@ 2019-08-06 7:11 ` Guo Ren
2019-08-06 7:18 ` Christoph Hellwig
0 siblings, 1 reply; 15+ messages in thread
From: Guo Ren @ 2019-08-06 7:11 UTC (permalink / raw)
To: Christoph Hellwig
Cc: Arnd Bergmann, Linux Kernel Mailing List, linux-arch, linux-csky,
feng_shizhu, zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
On Tue, Aug 6, 2019 at 2:49 PM Christoph Hellwig <hch@infradead.org> wrote:
>
> On Tue, Jul 30, 2019 at 08:15:44PM +0800, guoren@kernel.org wrote:
> > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > index 80783bb..3f1ff9d 100644
> > --- a/arch/csky/mm/dma-mapping.c
> > +++ b/arch/csky/mm/dma-mapping.c
> > @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
> > {
> > return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
> > }
> > -postcore_initcall(atomic_pool_init);
>
> Please keep the postcore_initcall next to the function it calls.
Ok. Change arch_initcall back to postcore_initcall. :)
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs
2019-08-06 7:11 ` Guo Ren
@ 2019-08-06 7:18 ` Christoph Hellwig
0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-08-06 7:18 UTC (permalink / raw)
To: Guo Ren
Cc: Christoph Hellwig, Arnd Bergmann, Linux Kernel Mailing List,
linux-arch, linux-csky, feng_shizhu, zhang_jian5, zheng_xingjian,
zhu_peng, Guo Ren
On Tue, Aug 06, 2019 at 03:11:13PM +0800, Guo Ren wrote:
> On Tue, Aug 6, 2019 at 2:49 PM Christoph Hellwig <hch@infradead.org> wrote:
> >
> > On Tue, Jul 30, 2019 at 08:15:44PM +0800, guoren@kernel.org wrote:
> > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > > index 80783bb..3f1ff9d 100644
> > > --- a/arch/csky/mm/dma-mapping.c
> > > +++ b/arch/csky/mm/dma-mapping.c
> > > @@ -18,71 +18,52 @@ static int __init atomic_pool_init(void)
> > > {
> > > return dma_atomic_pool_init(GFP_KERNEL, pgprot_noncached(PAGE_KERNEL));
> > > }
> > > -postcore_initcall(atomic_pool_init);
> >
> > Please keep the postcore_initcall next to the function it calls.
> Ok. Change arch_initcall back to postcore_initcall. :)
Well, if you have a good reason to change it please keep the type
init level change, but put it in a separate patch. But most importantly
don't move the place where is called around.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
2019-07-30 12:15 ` [PATCH 2/4] csky: Fixup dma_alloc_coherent with PAGE_SO attribute guoren
2019-07-30 12:15 ` [PATCH 3/4] csky/dma: Fixup cache_op failed when cross memory ZONEs guoren
@ 2019-07-30 12:15 ` guoren
2019-07-30 13:43 ` Arnd Bergmann
2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
3 siblings, 1 reply; 15+ messages in thread
From: guoren @ 2019-07-30 12:15 UTC (permalink / raw)
To: arnd
Cc: linux-kernel, linux-arch, linux-csky, feng_shizhu, zhang_jian5,
zheng_xingjian, zhu_peng, Guo Ren
From: Guo Ren <ren_guo@c-sky.com>
DMA_FROM_DEVICE only need to read dma data of memory into CPU cache,
so there is no need to clear cache before. Also clear + inv for
DMA_FROM_DEVICE won't cause problem, because the memory range for dma
won't be touched by software during dma working.
Signed-off-by: Guo Ren <ren_guo@c-sky.com>
---
arch/csky/include/asm/cache.h | 1 +
arch/csky/mm/cachev1.c | 7 ++++++-
arch/csky/mm/cachev2.c | 11 ++++++++++-
arch/csky/mm/dma-mapping.c | 4 ++++
4 files changed, 21 insertions(+), 2 deletions(-)
diff --git a/arch/csky/include/asm/cache.h b/arch/csky/include/asm/cache.h
index d683734..1d5fc2f 100644
--- a/arch/csky/include/asm/cache.h
+++ b/arch/csky/include/asm/cache.h
@@ -24,6 +24,7 @@ void cache_wbinv_range(unsigned long start, unsigned long end);
void cache_wbinv_all(void);
void dma_wbinv_range(unsigned long start, unsigned long end);
+void dma_inv_range(unsigned long start, unsigned long end);
void dma_wb_range(unsigned long start, unsigned long end);
#endif
diff --git a/arch/csky/mm/cachev1.c b/arch/csky/mm/cachev1.c
index b8a75cc..494ec91 100644
--- a/arch/csky/mm/cachev1.c
+++ b/arch/csky/mm/cachev1.c
@@ -120,7 +120,12 @@ void dma_wbinv_range(unsigned long start, unsigned long end)
cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
}
+void dma_inv_range(unsigned long start, unsigned long end)
+{
+ cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
+}
+
void dma_wb_range(unsigned long start, unsigned long end)
{
- cache_op_range(start, end, DATA_CACHE|CACHE_INV, 1);
+ cache_op_range(start, end, DATA_CACHE|CACHE_CLR|CACHE_INV, 1);
}
diff --git a/arch/csky/mm/cachev2.c b/arch/csky/mm/cachev2.c
index baaf05d..b61be65 100644
--- a/arch/csky/mm/cachev2.c
+++ b/arch/csky/mm/cachev2.c
@@ -69,11 +69,20 @@ void dma_wbinv_range(unsigned long start, unsigned long end)
sync_is();
}
+void dma_inv_range(unsigned long start, unsigned long end)
+{
+ unsigned long i = start & ~(L1_CACHE_BYTES - 1);
+
+ for (; i < end; i += L1_CACHE_BYTES)
+ asm volatile("dcache.iva %0\n"::"r"(i):"memory");
+ sync_is();
+}
+
void dma_wb_range(unsigned long start, unsigned long end)
{
unsigned long i = start & ~(L1_CACHE_BYTES - 1);
for (; i < end; i += L1_CACHE_BYTES)
- asm volatile("dcache.civa %0\n"::"r"(i):"memory");
+ asm volatile("dcache.cva %0\n"::"r"(i):"memory");
sync_is();
}
diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
index 3f1ff9d..d8f0f81 100644
--- a/arch/csky/mm/dma-mapping.c
+++ b/arch/csky/mm/dma-mapping.c
@@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
cache_op(paddr, size, dma_wb_range);
break;
case DMA_FROM_DEVICE:
+ cache_op(paddr, size, dma_inv_range);
+ break;
case DMA_BIDIRECTIONAL:
cache_op(paddr, size, dma_wbinv_range);
break;
@@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
cache_op(paddr, size, dma_wb_range);
break;
case DMA_FROM_DEVICE:
+ cache_op(paddr, size, dma_inv_range);
+ break;
case DMA_BIDIRECTIONAL:
cache_op(paddr, size, dma_wbinv_range);
break;
--
2.7.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
2019-07-30 12:15 ` [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE guoren
@ 2019-07-30 13:43 ` Arnd Bergmann
2019-07-30 15:11 ` Guo Ren
2019-08-06 6:51 ` Christoph Hellwig
0 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-07-30 13:43 UTC (permalink / raw)
To: Guo Ren
Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
On Tue, Jul 30, 2019 at 2:16 PM <guoren@kernel.org> wrote:
> From: Guo Ren <ren_guo@c-sky.com>
> diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> index 3f1ff9d..d8f0f81 100644
> --- a/arch/csky/mm/dma-mapping.c
> +++ b/arch/csky/mm/dma-mapping.c
> @@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
> cache_op(paddr, size, dma_wb_range);
> break;
> case DMA_FROM_DEVICE:
> + cache_op(paddr, size, dma_inv_range);
> + break;
> case DMA_BIDIRECTIONAL:
> cache_op(paddr, size, dma_wbinv_range);
> break;
> @@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
> cache_op(paddr, size, dma_wb_range);
> break;
> case DMA_FROM_DEVICE:
> + cache_op(paddr, size, dma_inv_range);
> + break;
> case DMA_BIDIRECTIONAL:
> cache_op(paddr, size, dma_wbinv_range);
> break;
When syncing 'for_cpu', you should not need to write back, because
there won't be any dirty cache lines.
If you have a CPU core that does not do speculative loads, you also don't
need to invalidate here, because you have already done that in the
_for_device() case, the only reason to invalidate the CPU cache
again is if a speculative load created a stale cache line that now
shadows the data received from the device.
Arnd
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
2019-07-30 13:43 ` Arnd Bergmann
@ 2019-07-30 15:11 ` Guo Ren
2019-07-30 15:22 ` Arnd Bergmann
2019-08-06 6:51 ` Christoph Hellwig
1 sibling, 1 reply; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:11 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
Thx Arnd,
On Tue, Jul 30, 2019 at 9:43 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 2:16 PM <guoren@kernel.org> wrote:
> > From: Guo Ren <ren_guo@c-sky.com>
>
> > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > index 3f1ff9d..d8f0f81 100644
> > --- a/arch/csky/mm/dma-mapping.c
> > +++ b/arch/csky/mm/dma-mapping.c
> > @@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev, phys_addr_t paddr,
> > cache_op(paddr, size, dma_wb_range);
> > break;
> > case DMA_FROM_DEVICE:
> > + cache_op(paddr, size, dma_inv_range);
> > + break;
> > case DMA_BIDIRECTIONAL:
> > cache_op(paddr, size, dma_wbinv_range);
> > break;
> > @@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev, phys_addr_t paddr,
> > cache_op(paddr, size, dma_wb_range);
> > break;
> > case DMA_FROM_DEVICE:
> > + cache_op(paddr, size, dma_inv_range);
> > + break;
> > case DMA_BIDIRECTIONAL:
> > cache_op(paddr, size, dma_wbinv_range);
> > break;
>
> When syncing 'for_cpu', you should not need to write back, because
> there won't be any dirty cache lines.
I just follow the dma_data_direction param, seems dir param and
function are a little bit duplicated. And our cpu won't clear clean
cache line into memory, so dma_wb_page won't cause problem.
Seems arch_sync_dma_for_cpu with dir=DMA_TO_DEVICE is self-contradictory.
Do you want me modfiy like these:
@@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev,
phys_addr_t paddr,
case DMA_TO_DEVICE:
case DMA_FROM_DEVICE:
case DMA_BIDIRECTIONAL:
cache_op(paddr, size, dma_inv_range);
break;
@@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev,
phys_addr_t paddr,
case DMA_TO_DEVICE:
cache_op(paddr, size, dma_wb_range);
break;
case DMA_FROM_DEVICE:
case DMA_BIDIRECTIONAL:
cache_op(paddr, size, dma_wbinv_range);
break;
>
> If you have a CPU core that does not do speculative loads, you also don't
> need to invalidate here, because you have already done that in the
> _for_device() case, the only reason to invalidate the CPU cache
> again is if a speculative load created a stale cache line that now
> shadows the data received from the device.
Our CPU support speculative loads :)
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
2019-07-30 15:11 ` Guo Ren
@ 2019-07-30 15:22 ` Arnd Bergmann
2019-07-30 15:48 ` Guo Ren
0 siblings, 1 reply; 15+ messages in thread
From: Arnd Bergmann @ 2019-07-30 15:22 UTC (permalink / raw)
To: Guo Ren
Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
On Tue, Jul 30, 2019 at 5:11 PM Guo Ren <guoren@kernel.org> wrote:
> > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > > cache_op(paddr, size, dma_wb_range);
> > > break;
> > > case DMA_FROM_DEVICE:
> > > + cache_op(paddr, size, dma_inv_range);
> > > + break;
> > > case DMA_BIDIRECTIONAL:
> > > cache_op(paddr, size, dma_wbinv_range);
> > > break;
> >
> > When syncing 'for_cpu', you should not need to write back, because
> > there won't be any dirty cache lines.
>
> I just follow the dma_data_direction param, seems dir param and
> function are a little bit duplicated. And our cpu won't clear clean
> cache line into memory, so dma_wb_page won't cause problem.
> Seems arch_sync_dma_for_cpu with dir=DMA_TO_DEVICE is
> self-contradictory.
Right, you generally don't need to do cache management for that
combination.
There might be other things to do here though, e.g. with a strict
iommu implementation one could tear down the i/o page table
entry to prevent the device from accessing a buffer while that is
owned by the cpu.
> Do you want me modfiy like these:
> @@ -88,6 +90,8 @@ void arch_sync_dma_for_cpu(struct device *dev,
> phys_addr_t paddr,
> case DMA_TO_DEVICE:
> case DMA_FROM_DEVICE:
> case DMA_BIDIRECTIONAL:
> cache_op(paddr, size, dma_inv_range);
> break;
>
> @@ -72,6 +72,8 @@ void arch_sync_dma_for_device(struct device *dev,
> phys_addr_t paddr,
> case DMA_TO_DEVICE:
> cache_op(paddr, size, dma_wb_range);
> break;
> case DMA_FROM_DEVICE:
> case DMA_BIDIRECTIONAL:
> cache_op(paddr, size, dma_wbinv_range);
> break;
>
> >
> > If you have a CPU core that does not do speculative loads, you also don't
> > need to invalidate here, because you have already done that in the
> > _for_device() case, the only reason to invalidate the CPU cache
> > again is if a speculative load created a stale cache line that now
> > shadows the data received from the device.
> Our CPU support speculative loads :)
Ok, then you both invalidations are indeed needed.
I was guessing that CK610 had no speculation.
Arnd
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
2019-07-30 15:22 ` Arnd Bergmann
@ 2019-07-30 15:48 ` Guo Ren
0 siblings, 0 replies; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:48 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
Thx Arnd,
On Tue, Jul 30, 2019 at 11:22 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 5:11 PM Guo Ren <guoren@kernel.org> wrote:
> > > > diff --git a/arch/csky/mm/dma-mapping.c b/arch/csky/mm/dma-mapping.c
> > > > cache_op(paddr, size, dma_wb_range);
> > > > break;
> > > > case DMA_FROM_DEVICE:
> > > > + cache_op(paddr, size, dma_inv_range);
> > > > + break;
> > > > case DMA_BIDIRECTIONAL:
> > > > cache_op(paddr, size, dma_wbinv_range);
> > > > break;
> > >
> > > When syncing 'for_cpu', you should not need to write back, because
> > > there won't be any dirty cache lines.
> >
> > I just follow the dma_data_direction param, seems dir param and
> > function are a little bit duplicated. And our cpu won't clear clean
> > cache line into memory, so dma_wb_page won't cause problem.
> > Seems arch_sync_dma_for_cpu with dir=DMA_TO_DEVICE is
> > self-contradictory.
>
> Right, you generally don't need to do cache management for that
> combination.
>
> There might be other things to do here though, e.g. with a strict
> iommu implementation one could tear down the i/o page table
> entry to prevent the device from accessing a buffer while that is
> owned by the cpu.
Tear down i/o page table shouldn't be put here and it's for map/unmap().
And I think arch_sync_dma_for_cpu/device should only do cache issues.
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE
2019-07-30 13:43 ` Arnd Bergmann
2019-07-30 15:11 ` Guo Ren
@ 2019-08-06 6:51 ` Christoph Hellwig
1 sibling, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2019-08-06 6:51 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Guo Ren, Linux Kernel Mailing List, linux-arch, linux-csky,
feng_shizhu, zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
On Tue, Jul 30, 2019 at 03:43:20PM +0200, Arnd Bergmann wrote:
> When syncing 'for_cpu', you should not need to write back, because
> there won't be any dirty cache lines.
>
> If you have a CPU core that does not do speculative loads, you also don't
> need to invalidate here, because you have already done that in the
> _for_device() case, the only reason to invalidate the CPU cache
> again is if a speculative load created a stale cache line that now
> shadows the data received from the device.
Yes. And that is one reason why I want to lift a set of common helpers
for both the speculating and non-speculating case to the common code
that just calls arch specific writeback/invalidate/writeback+invalidate
helpers. It hasn't been a priotity so far, but maybe it becomes one
now. Especially if I could draft someone else to help with it :)
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem
2019-07-30 12:15 [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem guoren
` (2 preceding siblings ...)
2019-07-30 12:15 ` [PATCH 4/4] csky: Add dma_inv_range for DMA_FROM_DEVICE guoren
@ 2019-07-30 13:29 ` Arnd Bergmann
2019-07-30 15:15 ` Guo Ren
2019-07-30 15:28 ` Guo Ren
3 siblings, 2 replies; 15+ messages in thread
From: Arnd Bergmann @ 2019-07-30 13:29 UTC (permalink / raw)
To: Guo Ren
Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
On Tue, Jul 30, 2019 at 2:15 PM <guoren@kernel.org> wrote:
>
> From: Guo Ren <ren_guo@c-sky.com>
>
> If arch didn't define dma_r/wmb(), linux will use w/rmb instead. Csky
> use bar.xxx to implement mb() and that will cause problem when sync data
> with dma device, becasue bar.xxx couldn't guarantee bus transactions
> finished at outside bus level. We must use sync.s instead of bar.xxx for
> dma data synchronization and it will guarantee retirement after getting
> the bus bresponse.
>
> Signed-off-by: Guo Ren <ren_guo@c-sky.com>
This change looks good to me, but I think your regular barriers
(mb, rmb, wmb) are still wrong: These are meant to be the superset
of dma_{r,w}mb and smp_{,r,w}mb, and synchronize
against both SMP and DMA interactions.
I suspect you can drop the '.s' for non-SMP builds. What I don't
see is whether you might need to add '.i' for dma_wmb() or
dma_rmb().
Arnd
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem
2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
@ 2019-07-30 15:15 ` Guo Ren
2019-07-30 15:28 ` Guo Ren
1 sibling, 0 replies; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:15 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
On Tue, Jul 30, 2019 at 9:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 2:15 PM <guoren@kernel.org> wrote:
> >
> > From: Guo Ren <ren_guo@c-sky.com>
> >
> > If arch didn't define dma_r/wmb(), linux will use w/rmb instead. Csky
> > use bar.xxx to implement mb() and that will cause problem when sync data
> > with dma device, becasue bar.xxx couldn't guarantee bus transactions
> > finished at outside bus level. We must use sync.s instead of bar.xxx for
> > dma data synchronization and it will guarantee retirement after getting
> > the bus bresponse.
> >
> > Signed-off-by: Guo Ren <ren_guo@c-sky.com>
>
> This change looks good to me, but I think your regular barriers
> (mb, rmb, wmb) are still wrong: These are meant to be the superset
> of dma_{r,w}mb and smp_{,r,w}mb, and synchronize
> against both SMP and DMA interactions.
>
> I suspect you can drop the '.s' for non-SMP builds. What I don't
> see is whether you might need to add '.i' for dma_wmb() or
> dma_rmb().
>
> Arnd
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem
2019-07-30 13:29 ` [PATCH 1/4] csky: Fixup dma_rmb/wmb synchronization problem Arnd Bergmann
2019-07-30 15:15 ` Guo Ren
@ 2019-07-30 15:28 ` Guo Ren
1 sibling, 0 replies; 15+ messages in thread
From: Guo Ren @ 2019-07-30 15:28 UTC (permalink / raw)
To: Arnd Bergmann
Cc: Linux Kernel Mailing List, linux-arch, linux-csky, feng_shizhu,
zhang_jian5, zheng_xingjian, zhu_peng, Guo Ren
Thx Arnd,
On Tue, Jul 30, 2019 at 9:29 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Tue, Jul 30, 2019 at 2:15 PM <guoren@kernel.org> wrote:
> >
> > From: Guo Ren <ren_guo@c-sky.com>
> >
> > If arch didn't define dma_r/wmb(), linux will use w/rmb instead. Csky
> > use bar.xxx to implement mb() and that will cause problem when sync data
> > with dma device, becasue bar.xxx couldn't guarantee bus transactions
> > finished at outside bus level. We must use sync.s instead of bar.xxx for
> > dma data synchronization and it will guarantee retirement after getting
> > the bus bresponse.
> >
> > Signed-off-by: Guo Ren <ren_guo@c-sky.com>
>
> This change looks good to me, but I think your regular barriers
> (mb, rmb, wmb) are still wrong: These are meant to be the superset
> of dma_{r,w}mb and smp_{,r,w}mb, and synchronize
> against both SMP and DMA interactions.
Wow, thanks for correction, yes! mb, rmb, wmb is the superset of dma
and smp, and I should use sync.s for mb.
I check the arm64, it use dsb for mb and dmb for smp_mb and dma_mb.
I also check the drivers, you are right , a lot of them use mb() to
sync mem data with device.
>
> I suspect you can drop the '.s' for non-SMP builds. What I don't
> see is whether you might need to add '.i' for dma_wmb() or
> dma_rmb().
It's no need for sky, when there is non-SMP, the cpu is in reset mode,
and sync.s is equal to sync in efficient.
--
Best Regards
Guo Ren
ML: https://lore.kernel.org/linux-csky/
^ permalink raw reply [flat|nested] 15+ messages in thread