All of lore.kernel.org
 help / color / mirror / Atom feed
* xen/arm and swiotlb-xen: possible data corruption
@ 2017-03-02  1:05 Stefano Stabellini
  2017-03-02  8:38 ` Edgar E. Iglesias
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Stabellini @ 2017-03-02  1:05 UTC (permalink / raw)
  To: edgar.iglesias; +Cc: xen-devel, julien.grall, sstabellini

Hi all,

Edgar reported a data corruption on network packets in dom0 when the
swiotlb-xen is in use. He also reported that the following patch "fixes"
the problem for him:

 static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
                size_t size, enum dma_data_direction dir)
 {
-       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
+       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
+       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + 64, dir, DMA_MAP);

I am thinking that the problem has something to do with cacheline
alignment on the Xen side
(xen/common/grant_table.c:__gnttab_cache_flush).

If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
== GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
parameter, v, could be non-cacheline aligned.

invalidate_dcache_va_range is capable of handling a not aligned address,
while clean_dcache_va_range does not.

Edgar, does the appended patch fix the problem for you?

---

diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index 86de0b6..9cdf2fb 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -322,10 +322,30 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
 
 static inline int clean_dcache_va_range(const void *p, unsigned long size)
 {
-    const void *end;
+    size_t off;
+    const void *end = p + size;
+
     dsb(sy);           /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+
+    off = (unsigned long)p % cacheline_bytes;
+    if ( off )
+    {
+        p -= off;
         asm volatile (__clean_dcache_one(0) : : "r" (p));
+        p += cacheline_bytes;
+        size -= cacheline_bytes - off;
+    }
+    off = (unsigned long)end % cacheline_bytes;
+    if ( off )
+    {
+        end -= off;
+        size -= off;
+        asm volatile (__clean_dcache_one(0) : : "r" (end));
+    }
+
+    for ( ; p < end; p += cacheline_bytes )
+        asm volatile (__clean_dcache_one(0) : : "r" (p));
+
     dsb(sy);           /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */
     return 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02  1:05 xen/arm and swiotlb-xen: possible data corruption Stefano Stabellini
@ 2017-03-02  8:38 ` Edgar E. Iglesias
  2017-03-02  8:53   ` Edgar E. Iglesias
  0 siblings, 1 reply; 13+ messages in thread
From: Edgar E. Iglesias @ 2017-03-02  8:38 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: xen-devel, julien.grall

On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> Hi all,
> 
> Edgar reported a data corruption on network packets in dom0 when the
> swiotlb-xen is in use. He also reported that the following patch "fixes"
> the problem for him:
> 
>  static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
>                 size_t size, enum dma_data_direction dir)
>  {
> -       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
> +       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
> +       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + 64, dir, DMA_MAP);
> 
> I am thinking that the problem has something to do with cacheline
> alignment on the Xen side
> (xen/common/grant_table.c:__gnttab_cache_flush).
> 
> If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
> == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
> parameter, v, could be non-cacheline aligned.
> 
> invalidate_dcache_va_range is capable of handling a not aligned address,
> while clean_dcache_va_range does not.
> 
> Edgar, does the appended patch fix the problem for you?


Thanks Stefano,

This does indeed fix the issue for me.

Cheers,
Edgar


> 
> ---
> 
> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> index 86de0b6..9cdf2fb 100644
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -322,10 +322,30 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
>  
>  static inline int clean_dcache_va_range(const void *p, unsigned long size)
>  {
> -    const void *end;
> +    size_t off;
> +    const void *end = p + size;
> +
>      dsb(sy);           /* So the CPU issues all writes to the range */
> -    for ( end = p + size; p < end; p += cacheline_bytes )
> +
> +    off = (unsigned long)p % cacheline_bytes;
> +    if ( off )
> +    {
> +        p -= off;
>          asm volatile (__clean_dcache_one(0) : : "r" (p));
> +        p += cacheline_bytes;
> +        size -= cacheline_bytes - off;
> +    }
> +    off = (unsigned long)end % cacheline_bytes;
> +    if ( off )
> +    {
> +        end -= off;
> +        size -= off;
> +        asm volatile (__clean_dcache_one(0) : : "r" (end));
> +    }
> +
> +    for ( ; p < end; p += cacheline_bytes )
> +        asm volatile (__clean_dcache_one(0) : : "r" (p));
> +
>      dsb(sy);           /* So we know the flushes happen before continuing */
>      /* ARM callers assume that dcache_* functions cannot fail. */
>      return 0;

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02  8:38 ` Edgar E. Iglesias
@ 2017-03-02  8:53   ` Edgar E. Iglesias
  2017-03-02 17:56     ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Edgar E. Iglesias @ 2017-03-02  8:53 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: xen-devel, julien.grall, Stefano Stabellini

On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > Hi all,
> > 
> > Edgar reported a data corruption on network packets in dom0 when the
> > swiotlb-xen is in use. He also reported that the following patch "fixes"
> > the problem for him:
> > 
> >  static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
> >                 size_t size, enum dma_data_direction dir)
> >  {
> > -       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
> > +       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
> > +       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + 64, dir, DMA_MAP);
> > 
> > I am thinking that the problem has something to do with cacheline
> > alignment on the Xen side
> > (xen/common/grant_table.c:__gnttab_cache_flush).
> > 
> > If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
> > == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
> > parameter, v, could be non-cacheline aligned.
> > 
> > invalidate_dcache_va_range is capable of handling a not aligned address,
> > while clean_dcache_va_range does not.
> > 
> > Edgar, does the appended patch fix the problem for you?
> 
> 
> Thanks Stefano,
> 
> This does indeed fix the issue for me.


Hi again,

Looking at the code, the problem here is that we may flush one cache line
less than expected.

This smaller patch fixes it for me too:
diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
index c492d6d..fa1b4dd 100644
--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
 {
     const void *end;
     dsb(sy);           /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+
+    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
+    for ( ; p < end; p += cacheline_bytes )
         asm volatile (__clean_dcache_one(0) : : "r" (p));
     dsb(sy);           /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */


Anyway, I'm OK with either fix.

Cheers,
Edgar



> 
> Cheers,
> Edgar
> 
> 
> > 
> > ---
> > 
> > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> > index 86de0b6..9cdf2fb 100644
> > --- a/xen/include/asm-arm/page.h
> > +++ b/xen/include/asm-arm/page.h
> > @@ -322,10 +322,30 @@ static inline int invalidate_dcache_va_range(const void *p, unsigned long size)
> >  
> >  static inline int clean_dcache_va_range(const void *p, unsigned long size)
> >  {
> > -    const void *end;
> > +    size_t off;
> > +    const void *end = p + size;
> > +
> >      dsb(sy);           /* So the CPU issues all writes to the range */
> > -    for ( end = p + size; p < end; p += cacheline_bytes )
> > +
> > +    off = (unsigned long)p % cacheline_bytes;
> > +    if ( off )
> > +    {
> > +        p -= off;
> >          asm volatile (__clean_dcache_one(0) : : "r" (p));
> > +        p += cacheline_bytes;
> > +        size -= cacheline_bytes - off;
> > +    }
> > +    off = (unsigned long)end % cacheline_bytes;
> > +    if ( off )
> > +    {
> > +        end -= off;
> > +        size -= off;
> > +        asm volatile (__clean_dcache_one(0) : : "r" (end));
> > +    }
> > +
> > +    for ( ; p < end; p += cacheline_bytes )
> > +        asm volatile (__clean_dcache_one(0) : : "r" (p));
> > +
> >      dsb(sy);           /* So we know the flushes happen before continuing */
> >      /* ARM callers assume that dcache_* functions cannot fail. */
> >      return 0;
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02  8:53   ` Edgar E. Iglesias
@ 2017-03-02 17:56     ` Julien Grall
  2017-03-02 19:12       ` Stefano Stabellini
  0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2017-03-02 17:56 UTC (permalink / raw)
  To: Edgar E. Iglesias, Edgar E. Iglesias; +Cc: xen-devel, nd, Stefano Stabellini

Hi Edgar,

On 02/03/17 08:53, Edgar E. Iglesias wrote:
> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
>>> Hi all,
>>>
>>> Edgar reported a data corruption on network packets in dom0 when the
>>> swiotlb-xen is in use. He also reported that the following patch "fixes"
>>> the problem for him:
>>>
>>>  static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t handle,
>>>                 size_t size, enum dma_data_direction dir)
>>>  {
>>> -       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size, dir, DMA_MAP);
>>> +       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
>>> +       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size + 64, dir, DMA_MAP);
>>>
>>> I am thinking that the problem has something to do with cacheline
>>> alignment on the Xen side
>>> (xen/common/grant_table.c:__gnttab_cache_flush).
>>>
>>> If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
>>> == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
>>> parameter, v, could be non-cacheline aligned.
>>>
>>> invalidate_dcache_va_range is capable of handling a not aligned address,
>>> while clean_dcache_va_range does not.
>>>
>>> Edgar, does the appended patch fix the problem for you?
>>
>>
>> Thanks Stefano,
>>
>> This does indeed fix the issue for me.
>
>
> Hi again,
>
> Looking at the code, the problem here is that we may flush one cache line
> less than expected.
>
> This smaller patch fixes it for me too:
> diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> index c492d6d..fa1b4dd 100644
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
>  {
>      const void *end;
>      dsb(sy);           /* So the CPU issues all writes to the range */
> -    for ( end = p + size; p < end; p += cacheline_bytes )
> +
> +    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
> +    for ( ; p < end; p += cacheline_bytes )
>          asm volatile (__clean_dcache_one(0) : : "r" (p));
>      dsb(sy);           /* So we know the flushes happen before continuing */
>      /* ARM callers assume that dcache_* functions cannot fail. */
>
>
> Anyway, I'm OK with either fix.

I would prefer your version compare to Stefano's one.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 17:56     ` Julien Grall
@ 2017-03-02 19:12       ` Stefano Stabellini
  2017-03-02 19:32         ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Stabellini @ 2017-03-02 19:12 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Edgar E. Iglesias, nd, Stefano Stabellini, xen-devel

On Thu, 2 Mar 2017, Julien Grall wrote:
> On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > > > Hi all,
> > > > 
> > > > Edgar reported a data corruption on network packets in dom0 when the
> > > > swiotlb-xen is in use. He also reported that the following patch "fixes"
> > > > the problem for him:
> > > > 
> > > >  static void __xen_dma_page_cpu_to_dev(struct device *hwdev, dma_addr_t
> > > > handle,
> > > >                 size_t size, enum dma_data_direction dir)
> > > >  {
> > > > -       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size,
> > > > dir, DMA_MAP);
> > > > +       printk("%s: addr=%lx size=%zd\n", __func__, handle, size);
> > > > +       dma_cache_maint(handle & PAGE_MASK, handle & ~PAGE_MASK, size +
> > > > 64, dir, DMA_MAP);
> > > > 
> > > > I am thinking that the problem has something to do with cacheline
> > > > alignment on the Xen side
> > > > (xen/common/grant_table.c:__gnttab_cache_flush).
> > > > 
> > > > If op == GNTTAB_CACHE_INVAL, we call invalidate_dcache_va_range; if op
> > > > == GNTTAB_CACHE_CLEAN, we call clean_dcache_va_range instead. The
> > > > parameter, v, could be non-cacheline aligned.
> > > > 
> > > > invalidate_dcache_va_range is capable of handling a not aligned address,
> > > > while clean_dcache_va_range does not.
> > > > 
> > > > Edgar, does the appended patch fix the problem for you?
> > > 
> > > 
> > > Thanks Stefano,
> > > 
> > > This does indeed fix the issue for me.

Thanks for reporting and testing!


> > Hi again,
> > 
> > Looking at the code, the problem here is that we may flush one cache line
> > less than expected.
> > 
> > This smaller patch fixes it for me too:
> > diff --git a/xen/include/asm-arm/page.h b/xen/include/asm-arm/page.h
> > index c492d6d..fa1b4dd 100644
> > --- a/xen/include/asm-arm/page.h
> > +++ b/xen/include/asm-arm/page.h
> > @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p,
> > unsigned long size)
> >  {
> >      const void *end;
> >      dsb(sy);           /* So the CPU issues all writes to the range */
> > -    for ( end = p + size; p < end; p += cacheline_bytes )
> > +
> > +    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
> > +    for ( ; p < end; p += cacheline_bytes )
> >          asm volatile (__clean_dcache_one(0) : : "r" (p));
> >      dsb(sy);           /* So we know the flushes happen before continuing
> > */
> >      /* ARM callers assume that dcache_* functions cannot fail. */
> > 
> > 
> > Anyway, I'm OK with either fix.
> 
> I would prefer your version compare to Stefano's one.

Julien, from looking at the two diffs, this is simpler and nicer, but if
you look at xen/include/asm-arm/page.h, my patch made
clean_dcache_va_range consistent with invalidate_dcache_va_range. For
consistency, I would prefer to deal with the two functions the same way.
Although it is not a spec requirement, I also think that it is a good
idea to issue cache flushes from cacheline aligned addresses, like
invalidate_dcache_va_range does and Linux does, to make more obvious
what is going on.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 19:12       ` Stefano Stabellini
@ 2017-03-02 19:32         ` Julien Grall
  2017-03-02 22:39           ` Stefano Stabellini
  0 siblings, 1 reply; 13+ messages in thread
From: Julien Grall @ 2017-03-02 19:32 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Edgar E. Iglesias, Edgar E. Iglesias, nd, xen-devel

Hi Stefano,

On 02/03/17 19:12, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Julien Grall wrote:
>> On 02/03/17 08:53, Edgar E. Iglesias wrote:
>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> Julien, from looking at the two diffs, this is simpler and nicer, but if
> you look at xen/include/asm-arm/page.h, my patch made
> clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> consistency, I would prefer to deal with the two functions the same way.
> Although it is not a spec requirement, I also think that it is a good
> idea to issue cache flushes from cacheline aligned addresses, like
> invalidate_dcache_va_range does and Linux does, to make more obvious
> what is going on.

invalid_dcache_va_range is split because the cache instruction differs 
for the start and end if unaligned. For them you want to use clean & 
invalidate rather than invalidate.

If you look at the implementation of other cache helpers in Linux (see 
dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only 
align start & end.

Also, the invalid_dcache_va_range is using modulo which I would rather 
avoid. The modulo in this case will not be optimized by the compiler 
because cacheline_bytes is not a constant.

So I still prefer to keep this function really simple.

BTW, you would also need to fix clean_and_invalidate_dcache_va_range.

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 19:32         ` Julien Grall
@ 2017-03-02 22:39           ` Stefano Stabellini
  2017-03-02 22:55             ` Edgar E. Iglesias
  2017-03-02 23:19             ` Julien Grall
  0 siblings, 2 replies; 13+ messages in thread
From: Stefano Stabellini @ 2017-03-02 22:39 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Edgar E. Iglesias, nd, Stefano Stabellini, xen-devel

On Thu, 2 Mar 2017, Julien Grall wrote:
> Hi Stefano,
> 
> On 02/03/17 19:12, Stefano Stabellini wrote:
> > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > Julien, from looking at the two diffs, this is simpler and nicer, but if
> > you look at xen/include/asm-arm/page.h, my patch made
> > clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> > consistency, I would prefer to deal with the two functions the same way.
> > Although it is not a spec requirement, I also think that it is a good
> > idea to issue cache flushes from cacheline aligned addresses, like
> > invalidate_dcache_va_range does and Linux does, to make more obvious
> > what is going on.
> 
> invalid_dcache_va_range is split because the cache instruction differs for the
> start and end if unaligned. For them you want to use clean & invalidate rather
> than invalidate.
> 
> If you look at the implementation of other cache helpers in Linux (see
> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align
> start & end.

I don't think so, unless I am reading dcache_by_line_op wrong.


> Also, the invalid_dcache_va_range is using modulo which I would rather avoid.
> The modulo in this case will not be optimized by the compiler because
> cacheline_bytes is not a constant.

That is a good point. What if I replace the modulo op with

  p & (cacheline_bytes - 1)

in invalidate_dcache_va_range, then add the similar code to
clean_dcache_va_range and clean_and_invalidate_dcache_va_range?


> BTW, you would also need to fix clean_and_invalidate_dcache_va_range.

I'll do that, thanks for the reminder.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 22:39           ` Stefano Stabellini
@ 2017-03-02 22:55             ` Edgar E. Iglesias
  2017-03-02 23:07               ` Stefano Stabellini
  2017-03-02 23:19             ` Julien Grall
  1 sibling, 1 reply; 13+ messages in thread
From: Edgar E. Iglesias @ 2017-03-02 22:55 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Edgar E. Iglesias, Julien Grall, nd, xen-devel

On Thu, Mar 02, 2017 at 02:39:55PM -0800, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Julien Grall wrote:
> > Hi Stefano,
> > 
> > On 02/03/17 19:12, Stefano Stabellini wrote:
> > > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > > On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > > Julien, from looking at the two diffs, this is simpler and nicer, but if
> > > you look at xen/include/asm-arm/page.h, my patch made
> > > clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> > > consistency, I would prefer to deal with the two functions the same way.
> > > Although it is not a spec requirement, I also think that it is a good
> > > idea to issue cache flushes from cacheline aligned addresses, like
> > > invalidate_dcache_va_range does and Linux does, to make more obvious
> > > what is going on.
> > 
> > invalid_dcache_va_range is split because the cache instruction differs for the
> > start and end if unaligned. For them you want to use clean & invalidate rather
> > than invalidate.
> > 
> > If you look at the implementation of other cache helpers in Linux (see
> > dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align
> > start & end.
> 
> I don't think so, unless I am reading dcache_by_line_op wrong.
> 
> 
> > Also, the invalid_dcache_va_range is using modulo which I would rather avoid.
> > The modulo in this case will not be optimized by the compiler because
> > cacheline_bytes is not a constant.
> 
> That is a good point. What if I replace the modulo op with
> 
>   p & (cacheline_bytes - 1)
> 
> in invalidate_dcache_va_range, then add the similar code to
> clean_dcache_va_range and clean_and_invalidate_dcache_va_range?


Yeah, if there was some kind of generic ALIGN or ROUND_DOWN macro we could do:

--- a/xen/include/asm-arm/page.h
+++ b/xen/include/asm-arm/page.h
@@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
 {
     const void *end;
     dsb(sy);           /* So the CPU issues all writes to the range */
-    for ( end = p + size; p < end; p += cacheline_bytes )
+
+    p = (void *)ALIGN((uintptr_t)p, cacheline_bytes);
+    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
+    for ( ; p < end; p += cacheline_bytes )
         asm volatile (__clean_dcache_one(0) : : "r" (p));
     dsb(sy);           /* So we know the flushes happen before continuing */
     /* ARM callers assume that dcache_* functions cannot fail. */

I think that would achieve the same result as your patch Stefano?

Cheers,
Edgar


> 
> 
> > BTW, you would also need to fix clean_and_invalidate_dcache_va_range.
> 
> I'll do that, thanks for the reminder.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 22:55             ` Edgar E. Iglesias
@ 2017-03-02 23:07               ` Stefano Stabellini
  2017-03-02 23:24                 ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Stabellini @ 2017-03-02 23:07 UTC (permalink / raw)
  To: Edgar E. Iglesias
  Cc: nd, Julien Grall, Stefano Stabellini, xen-devel, Edgar E. Iglesias

On Thu, 2 Mar 2017, Edgar E. Iglesias wrote:
> On Thu, Mar 02, 2017 at 02:39:55PM -0800, Stefano Stabellini wrote:
> > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 02/03/17 19:12, Stefano Stabellini wrote:
> > > > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > > > On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > > > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
> > > > Julien, from looking at the two diffs, this is simpler and nicer, but if
> > > > you look at xen/include/asm-arm/page.h, my patch made
> > > > clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> > > > consistency, I would prefer to deal with the two functions the same way.
> > > > Although it is not a spec requirement, I also think that it is a good
> > > > idea to issue cache flushes from cacheline aligned addresses, like
> > > > invalidate_dcache_va_range does and Linux does, to make more obvious
> > > > what is going on.
> > > 
> > > invalid_dcache_va_range is split because the cache instruction differs for the
> > > start and end if unaligned. For them you want to use clean & invalidate rather
> > > than invalidate.
> > > 
> > > If you look at the implementation of other cache helpers in Linux (see
> > > dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align
> > > start & end.
> > 
> > I don't think so, unless I am reading dcache_by_line_op wrong.
> > 
> > 
> > > Also, the invalid_dcache_va_range is using modulo which I would rather avoid.
> > > The modulo in this case will not be optimized by the compiler because
> > > cacheline_bytes is not a constant.
> > 
> > That is a good point. What if I replace the modulo op with
> > 
> >   p & (cacheline_bytes - 1)
> > 
> > in invalidate_dcache_va_range, then add the similar code to
> > clean_dcache_va_range and clean_and_invalidate_dcache_va_range?
> 
> 
> Yeah, if there was some kind of generic ALIGN or ROUND_DOWN macro we could do:
> 
> --- a/xen/include/asm-arm/page.h
> +++ b/xen/include/asm-arm/page.h
> @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
>  {
>      const void *end;
>      dsb(sy);           /* So the CPU issues all writes to the range */
> -    for ( end = p + size; p < end; p += cacheline_bytes )
> +
> +    p = (void *)ALIGN((uintptr_t)p, cacheline_bytes);
> +    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);

Even simpler:

   end = p + size;
   p = (void *)ALIGN((uintptr_t)p, cacheline_bytes);


> +    for ( ; p < end; p += cacheline_bytes )
>          asm volatile (__clean_dcache_one(0) : : "r" (p));
>      dsb(sy);           /* So we know the flushes happen before continuing */
>      /* ARM callers assume that dcache_* functions cannot fail. */
> 
> I think that would achieve the same result as your patch Stefano?

Yes, indeed, that's better.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 22:39           ` Stefano Stabellini
  2017-03-02 22:55             ` Edgar E. Iglesias
@ 2017-03-02 23:19             ` Julien Grall
  2017-03-03  0:53               ` Stefano Stabellini
  1 sibling, 1 reply; 13+ messages in thread
From: Julien Grall @ 2017-03-02 23:19 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Edgar E. Iglesias, Edgar E. Iglesias, nd, xen-devel



On 02/03/2017 22:39, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Julien Grall wrote:
>> Hi Stefano,
>>
>> On 02/03/17 19:12, Stefano Stabellini wrote:
>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>> On 02/03/17 08:53, Edgar E. Iglesias wrote:
>>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
>>> Julien, from looking at the two diffs, this is simpler and nicer, but if
>>> you look at xen/include/asm-arm/page.h, my patch made
>>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For
>>> consistency, I would prefer to deal with the two functions the same way.
>>> Although it is not a spec requirement, I also think that it is a good
>>> idea to issue cache flushes from cacheline aligned addresses, like
>>> invalidate_dcache_va_range does and Linux does, to make more obvious
>>> what is going on.
>>
>> invalid_dcache_va_range is split because the cache instruction differs for the
>> start and end if unaligned. For them you want to use clean & invalidate rather
>> than invalidate.
>>
>> If you look at the implementation of other cache helpers in Linux (see
>> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align
>> start & end.
>
> I don't think so, unless I am reading dcache_by_line_op wrong.

343         .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
344         dcache_line_size \tmp1, \tmp2
345         add     \size, \kaddr, \size
346         sub     \tmp2, \tmp1, #1
347         bic     \kaddr, \kaddr, \tmp2
348 9998:
349         .if     (\op == cvau || \op == cvac)
350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
351         dc      \op, \kaddr
352 alternative_else
353         dc      civac, \kaddr
354 alternative_endif
355         .else
356         dc      \op, \kaddr
357         .endif
358         add     \kaddr, \kaddr, \tmp1
359         cmp     \kaddr, \size
360         b.lo    9998b
361         dsb     \domain
362         .endm
363

It has only one cache instruction in the resulting assembly because it 
has .if/.else assembly directives.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 23:07               ` Stefano Stabellini
@ 2017-03-02 23:24                 ` Julien Grall
  0 siblings, 0 replies; 13+ messages in thread
From: Julien Grall @ 2017-03-02 23:24 UTC (permalink / raw)
  To: Stefano Stabellini, Edgar E. Iglesias; +Cc: Edgar E. Iglesias, nd, xen-devel



On 02/03/2017 23:07, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Edgar E. Iglesias wrote:
>> On Thu, Mar 02, 2017 at 02:39:55PM -0800, Stefano Stabellini wrote:
>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 02/03/17 19:12, Stefano Stabellini wrote:
>>>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>>>> On 02/03/17 08:53, Edgar E. Iglesias wrote:
>>>>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>>>>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini wrote:
>>>>> Julien, from looking at the two diffs, this is simpler and nicer, but if
>>>>> you look at xen/include/asm-arm/page.h, my patch made
>>>>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For
>>>>> consistency, I would prefer to deal with the two functions the same way.
>>>>> Although it is not a spec requirement, I also think that it is a good
>>>>> idea to issue cache flushes from cacheline aligned addresses, like
>>>>> invalidate_dcache_va_range does and Linux does, to make more obvious
>>>>> what is going on.
>>>>
>>>> invalid_dcache_va_range is split because the cache instruction differs for the
>>>> start and end if unaligned. For them you want to use clean & invalidate rather
>>>> than invalidate.
>>>>
>>>> If you look at the implementation of other cache helpers in Linux (see
>>>> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only align
>>>> start & end.
>>>
>>> I don't think so, unless I am reading dcache_by_line_op wrong.
>>>
>>>
>>>> Also, the invalid_dcache_va_range is using modulo which I would rather avoid.
>>>> The modulo in this case will not be optimized by the compiler because
>>>> cacheline_bytes is not a constant.
>>>
>>> That is a good point. What if I replace the modulo op with
>>>
>>>   p & (cacheline_bytes - 1)
>>>
>>> in invalidate_dcache_va_range, then add the similar code to
>>> clean_dcache_va_range and clean_and_invalidate_dcache_va_range?
>>
>>
>> Yeah, if there was some kind of generic ALIGN or ROUND_DOWN macro we could do:
>>
>> --- a/xen/include/asm-arm/page.h
>> +++ b/xen/include/asm-arm/page.h
>> @@ -325,7 +325,9 @@ static inline int clean_dcache_va_range(const void *p, unsigned long size)
>>  {
>>      const void *end;
>>      dsb(sy);           /* So the CPU issues all writes to the range */
>> -    for ( end = p + size; p < end; p += cacheline_bytes )
>> +
>> +    p = (void *)ALIGN((uintptr_t)p, cacheline_bytes);
>> +    end = (void *)ROUNDUP((uintptr_t)p + size, cacheline_bytes);
>
> Even simpler:
>
>    end = p + size;
>    p = (void *)ALIGN((uintptr_t)p, cacheline_bytes);

We don't have any ALIGN macro in Xen and the way we use the term align 
in xen is very similar to ROUNDUP.

However a simple p = (void *)((uintptr_t)p & ~(cacheline_bytes - 1)) 
should work here.

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-02 23:19             ` Julien Grall
@ 2017-03-03  0:53               ` Stefano Stabellini
  2017-03-03 16:20                 ` Julien Grall
  0 siblings, 1 reply; 13+ messages in thread
From: Stefano Stabellini @ 2017-03-03  0:53 UTC (permalink / raw)
  To: Julien Grall
  Cc: Edgar E. Iglesias, Edgar E. Iglesias, nd, Stefano Stabellini, xen-devel

On Thu, 2 Mar 2017, Julien Grall wrote:
> On 02/03/2017 22:39, Stefano Stabellini wrote:
> > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > Hi Stefano,
> > > 
> > > On 02/03/17 19:12, Stefano Stabellini wrote:
> > > > On Thu, 2 Mar 2017, Julien Grall wrote:
> > > > > On 02/03/17 08:53, Edgar E. Iglesias wrote:
> > > > > > On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
> > > > > > > On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini
> > > > > > > wrote:
> > > > Julien, from looking at the two diffs, this is simpler and nicer, but if
> > > > you look at xen/include/asm-arm/page.h, my patch made
> > > > clean_dcache_va_range consistent with invalidate_dcache_va_range. For
> > > > consistency, I would prefer to deal with the two functions the same way.
> > > > Although it is not a spec requirement, I also think that it is a good
> > > > idea to issue cache flushes from cacheline aligned addresses, like
> > > > invalidate_dcache_va_range does and Linux does, to make more obvious
> > > > what is going on.
> > > 
> > > invalid_dcache_va_range is split because the cache instruction differs for
> > > the
> > > start and end if unaligned. For them you want to use clean & invalidate
> > > rather
> > > than invalidate.
> > > 
> > > If you look at the implementation of other cache helpers in Linux (see
> > > dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only
> > > align
> > > start & end.
> > 
> > I don't think so, unless I am reading dcache_by_line_op wrong.
> 
> 343         .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
> 344         dcache_line_size \tmp1, \tmp2
> 345         add     \size, \kaddr, \size
> 346         sub     \tmp2, \tmp1, #1
> 347         bic     \kaddr, \kaddr, \tmp2
> 348 9998:
> 349         .if     (\op == cvau || \op == cvac)
> 350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
> 351         dc      \op, \kaddr
> 352 alternative_else
> 353         dc      civac, \kaddr
> 354 alternative_endif
> 355         .else
> 356         dc      \op, \kaddr
> 357         .endif
> 358         add     \kaddr, \kaddr, \tmp1
> 359         cmp     \kaddr, \size
> 360         b.lo    9998b
> 361         dsb     \domain
> 362         .endm
> 363
> 
> It has only one cache instruction in the resulting assembly because it has
> .if/.else assembly directives.

Yes, but it does not only align start and end, all cache instructions
are called on aligned addresses, right?

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: xen/arm and swiotlb-xen: possible data corruption
  2017-03-03  0:53               ` Stefano Stabellini
@ 2017-03-03 16:20                 ` Julien Grall
  0 siblings, 0 replies; 13+ messages in thread
From: Julien Grall @ 2017-03-03 16:20 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Edgar E. Iglesias, Edgar E. Iglesias, nd, xen-devel

Hi Stefano,

On 03/03/17 00:53, Stefano Stabellini wrote:
> On Thu, 2 Mar 2017, Julien Grall wrote:
>> On 02/03/2017 22:39, Stefano Stabellini wrote:
>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>> Hi Stefano,
>>>>
>>>> On 02/03/17 19:12, Stefano Stabellini wrote:
>>>>> On Thu, 2 Mar 2017, Julien Grall wrote:
>>>>>> On 02/03/17 08:53, Edgar E. Iglesias wrote:
>>>>>>> On Thu, Mar 02, 2017 at 09:38:37AM +0100, Edgar E. Iglesias wrote:
>>>>>>>> On Wed, Mar 01, 2017 at 05:05:21PM -0800, Stefano Stabellini
>>>>>>>> wrote:
>>>>> Julien, from looking at the two diffs, this is simpler and nicer, but if
>>>>> you look at xen/include/asm-arm/page.h, my patch made
>>>>> clean_dcache_va_range consistent with invalidate_dcache_va_range. For
>>>>> consistency, I would prefer to deal with the two functions the same way.
>>>>> Although it is not a spec requirement, I also think that it is a good
>>>>> idea to issue cache flushes from cacheline aligned addresses, like
>>>>> invalidate_dcache_va_range does and Linux does, to make more obvious
>>>>> what is going on.
>>>>
>>>> invalid_dcache_va_range is split because the cache instruction differs for
>>>> the
>>>> start and end if unaligned. For them you want to use clean & invalidate
>>>> rather
>>>> than invalidate.
>>>>
>>>> If you look at the implementation of other cache helpers in Linux (see
>>>> dcache_by_line_op in arch/arm64/include/asm/assembler.h), they will only
>>>> align
>>>> start & end.
>>>
>>> I don't think so, unless I am reading dcache_by_line_op wrong.
>>
>> 343         .macro dcache_by_line_op op, domain, kaddr, size, tmp1, tmp2
>> 344         dcache_line_size \tmp1, \tmp2
>> 345         add     \size, \kaddr, \size
>> 346         sub     \tmp2, \tmp1, #1
>> 347         bic     \kaddr, \kaddr, \tmp2
>> 348 9998:
>> 349         .if     (\op == cvau || \op == cvac)
>> 350 alternative_if_not ARM64_WORKAROUND_CLEAN_CACHE
>> 351         dc      \op, \kaddr
>> 352 alternative_else
>> 353         dc      civac, \kaddr
>> 354 alternative_endif
>> 355         .else
>> 356         dc      \op, \kaddr
>> 357         .endif
>> 358         add     \kaddr, \kaddr, \tmp1
>> 359         cmp     \kaddr, \size
>> 360         b.lo    9998b
>> 361         dsb     \domain
>> 362         .endm
>> 363
>>
>> It has only one cache instruction in the resulting assembly because it has
>> .if/.else assembly directives.
>
> Yes, but it does not only align start and end, all cache instructions
> are called on aligned addresses, right?

I don't think so. The instruction "bic     \kaddr, \kaddr, \tmp2" will 
make sure the start address is aligned to a cache line size.

The C version of the assembly code is exactly what you wrote on the 
previous e-mail:

    end = p + size;
    p = (void *)ALIGN((uintptr_t)p, cacheline_bytes);

Cheers,

-- 
Julien Grall

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
https://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-03-03 16:21 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-02  1:05 xen/arm and swiotlb-xen: possible data corruption Stefano Stabellini
2017-03-02  8:38 ` Edgar E. Iglesias
2017-03-02  8:53   ` Edgar E. Iglesias
2017-03-02 17:56     ` Julien Grall
2017-03-02 19:12       ` Stefano Stabellini
2017-03-02 19:32         ` Julien Grall
2017-03-02 22:39           ` Stefano Stabellini
2017-03-02 22:55             ` Edgar E. Iglesias
2017-03-02 23:07               ` Stefano Stabellini
2017-03-02 23:24                 ` Julien Grall
2017-03-02 23:19             ` Julien Grall
2017-03-03  0:53               ` Stefano Stabellini
2017-03-03 16:20                 ` Julien Grall

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.