All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
@ 2013-06-18 12:05 Jaeyong Yoo
  2013-06-18 12:18 ` Ian Campbell
  0 siblings, 1 reply; 17+ messages in thread
From: Jaeyong Yoo @ 2013-06-18 12:05 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, xen-devel

> 
> But all of the caches on this platform are PIPT (right?) so isn't it
> actually:
> 
>        (consumer)             (producer)
>             xen                   DomU
>               \                 /   (writing path)
>                \               /
>                 \             /
> (reading path)  \           /
>                   \         /
>                     (cache)
>                        ||
>                        ||
>                        \/
> _______________________________________
>                      |   mfn   |        (physical memory)
> ---------------------------------------
> 
> 
> Or are you saying that the writing path is uncached?

Oops my mistake. As far as I know, it is PIPT and the writing also 
should be cached.

> 
> I was chatting with Tim and he suggested that the issue might be the
> ReOrder Buffer, which is virtually tagged. In that case a DMB ought to
> be sufficient and not a full cache flush, we think.
> 
> We were also speculating that we probably want some DMBs in
> context_switch_{from,to} as well as at return_to_guest.

Actually, I just learned ReOrder Buffer, and it looks like so.

Best,
Jaeyong

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-18 12:05 [PATCH] ARM: cache coherence problem in guestcopy.c Jaeyong Yoo
@ 2013-06-18 12:18 ` Ian Campbell
  2013-06-19 15:12   ` Ian Campbell
  2013-06-20 11:55   ` Stefano Stabellini
  0 siblings, 2 replies; 17+ messages in thread
From: Ian Campbell @ 2013-06-18 12:18 UTC (permalink / raw)
  To: jaeyong.yoo; +Cc: Tim Deegan, xen-devel

On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > We were also speculating that we probably want some DMBs in
> > context_switch_{from,to} as well as at return_to_guest.
> 
> Actually, I just learned ReOrder Buffer, and it looks like so.

Right. Tim and I were speculating about where and why the barriers were
needed:

In unmap_domain_page: This ensures any writes to domain memory via the
Xen mapping are seen by subsequent accesses via the guest mapping.
Likewise map_domain_page needs one to ensure that previous accesses via
the guest mapping will be observed (this is the issue you observed).

In context_switch_from: To ensure any writes done on a VCPU are seen
before it is rescheduled, potentially on another PCPU. It seems likely
that you'd want one in context_switch_to for the complementary reasons,

In return_to_guest: This is similar to the unmap_domain_page one but
handles writes to pages which are permanently shared by Xen and the
guest, e.g. the shared info page or anything mapped with
share_xen_page_with_guest etc. This one is debatable because all users
of this are, I think, expected to get their own barriers correct since
this these pages are typically part of some lock free datastructure
arrangement.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-18 12:18 ` Ian Campbell
@ 2013-06-19 15:12   ` Ian Campbell
  2013-06-20 11:55   ` Stefano Stabellini
  1 sibling, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2013-06-19 15:12 UTC (permalink / raw)
  To: jaeyong.yoo; +Cc: Tim Deegan, xen-devel

On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > We were also speculating that we probably want some DMBs in
> > > context_switch_{from,to} as well as at return_to_guest.
> > 
> > Actually, I just learned ReOrder Buffer, and it looks like so.

Does this patch help with the issue you are seeing?

I think in theory it should *BUT* (and it's a big But)...

... it doesn't work for me on an issue I've been seeing booting dom0 on
the Foundation model. However doing the dmb in map_domain_page() *twice*
does work for me. In fact doing a dmb+nop works too. This is
inexplicable to me. It may be a model bug or it may (more likely) be
indicative of some other underlying issue. I also tried dsb() instead of
dmb() and that doesn't make a difference.

Anyway, I'm interested about its behaviour on real hardware.

These are doing full system barriers. I think in reality only a "dmb
nsh" should be needed for this use case, since we only care about the
local processor. I didn't want to go there when the supposedly more
obvious case wasn't doing as I expected!

Ian.

diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
index beae61e..ef67953 100644
--- a/xen/arch/arm/mm.c
+++ b/xen/arch/arm/mm.c
@@ -218,6 +218,13 @@ void *map_domain_page(unsigned long mfn)
 
     local_irq_save(flags);
 
+    /*
+     * Ensure that any previous writes which occurred via another
+     * mapping, specifically the guest's own mapping have been
+     * completed such that they are visible via this mapping.
+     */
+    dmb();
+
     /* The map is laid out as an open-addressed hash table where each
      * entry is a 2MB superpage pte.  We use the available bits of each
      * PTE as a reference count; when the refcount is zero the slot can
@@ -286,6 +297,13 @@ void unmap_domain_page(const void *va)
 
     map[slot].pt.avail--;
 
+    /*
+     * Ensure that any writes which occurred via this mapping have been
+     * completed such that they are visible via other virtual
+     * mappings, specifically the guest's own mapping.
+     */
+    dmb();
+
     local_irq_restore(flags);
 }

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-18 12:18 ` Ian Campbell
  2013-06-19 15:12   ` Ian Campbell
@ 2013-06-20 11:55   ` Stefano Stabellini
  2013-06-20 12:19     ` Tim Deegan
  1 sibling, 1 reply; 17+ messages in thread
From: Stefano Stabellini @ 2013-06-20 11:55 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, jaeyong.yoo, xen-devel

On Tue, 18 Jun 2013, Ian Campbell wrote:
> On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > We were also speculating that we probably want some DMBs in
> > > context_switch_{from,to} as well as at return_to_guest.
> > 
> > Actually, I just learned ReOrder Buffer, and it looks like so.
> 
> Right. Tim and I were speculating about where and why the barriers were
> needed:
> 
> In unmap_domain_page: This ensures any writes to domain memory via the
> Xen mapping are seen by subsequent accesses via the guest mapping.
> Likewise map_domain_page needs one to ensure that previous accesses via
> the guest mapping will be observed (this is the issue you observed).

What if the guest doesn't have caching enabled?
It seems to me that the only way to make sure that the guest is going to
be able to see the changes made by Xen is flushing the dcache.
Unless we mandate that all Xen guests are required to have caching
enabled. 


> In context_switch_from: To ensure any writes done on a VCPU are seen
> before it is rescheduled, potentially on another PCPU. It seems likely
> that you'd want one in context_switch_to for the complementary reasons,

Is the cache shared across multiple pcpus? Because if it isn't then we
need to flush the dcache in this case.


> In return_to_guest: This is similar to the unmap_domain_page one but
> handles writes to pages which are permanently shared by Xen and the
> guest, e.g. the shared info page or anything mapped with
> share_xen_page_with_guest etc. This one is debatable because all users
> of this are, I think, expected to get their own barriers correct since
> this these pages are typically part of some lock free datastructure
> arrangement.

same as above

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-20 11:55   ` Stefano Stabellini
@ 2013-06-20 12:19     ` Tim Deegan
  2013-06-25  9:43       ` Ian Campbell
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Deegan @ 2013-06-20 12:19 UTC (permalink / raw)
  To: Stefano Stabellini; +Cc: Ian Campbell, jaeyong.yoo, xen-devel

At 12:55 +0100 on 20 Jun (1371732938), Stefano Stabellini wrote:
> On Tue, 18 Jun 2013, Ian Campbell wrote:
> > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > We were also speculating that we probably want some DMBs in
> > > > context_switch_{from,to} as well as at return_to_guest.
> > > 
> > > Actually, I just learned ReOrder Buffer, and it looks like so.
> > 
> > Right. Tim and I were speculating about where and why the barriers were
> > needed:
> > 
> > In unmap_domain_page: This ensures any writes to domain memory via the
> > Xen mapping are seen by subsequent accesses via the guest mapping.
> > Likewise map_domain_page needs one to ensure that previous accesses via
> > the guest mapping will be observed (this is the issue you observed).
> 
> What if the guest doesn't have caching enabled?

Erk!  I think we can mandate that the guest use caching for memory it
shares with the hypervisor (i.e. hypercall args & responses, shared
pages, granted pages).  Otherwise we end up flushing the dcache before
and after every copy.

In any case, this is something that should get documented. :)

> > In context_switch_from: To ensure any writes done on a VCPU are seen
> > before it is rescheduled, potentially on another PCPU. It seems likely
> > that you'd want one in context_switch_to for the complementary reasons,
> 
> Is the cache shared across multiple pcpus? Because if it isn't then we
> need to flush the dcache in this case.

AIUI this is what marking accesses 'shareable' is for: to tell the caches
to keep them coherent.  If not, we're equally in trouble for all Xen's
internal data structures.

Tim.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-20 12:19     ` Tim Deegan
@ 2013-06-25  9:43       ` Ian Campbell
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2013-06-25  9:43 UTC (permalink / raw)
  To: Tim Deegan; +Cc: xen-devel, jaeyong.yoo, Stefano Stabellini

On Thu, 2013-06-20 at 13:19 +0100, Tim Deegan wrote:
> At 12:55 +0100 on 20 Jun (1371732938), Stefano Stabellini wrote:
> > On Tue, 18 Jun 2013, Ian Campbell wrote:
> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > > We were also speculating that we probably want some DMBs in
> > > > > context_switch_{from,to} as well as at return_to_guest.
> > > > 
> > > > Actually, I just learned ReOrder Buffer, and it looks like so.
> > > 
> > > Right. Tim and I were speculating about where and why the barriers were
> > > needed:
> > > 
> > > In unmap_domain_page: This ensures any writes to domain memory via the
> > > Xen mapping are seen by subsequent accesses via the guest mapping.
> > > Likewise map_domain_page needs one to ensure that previous accesses via
> > > the guest mapping will be observed (this is the issue you observed).
> > 
> > What if the guest doesn't have caching enabled?
> 
> Erk!  I think we can mandate that the guest use caching for memory it
> shares with the hypervisor (i.e. hypercall args & responses, shared
> pages, granted pages).  Otherwise we end up flushing the dcache before
> and after every copy.

Yes. Worse because of the reusing/reference counting scheme used in
map_domain_page I'm not sure we can tell when we need to flush the cache
vs. discard it (because the guest has written something useful behind
the cache).

> In any case, this is something that should get documented. :)

Yes.

> > > In context_switch_from: To ensure any writes done on a VCPU are seen
> > > before it is rescheduled, potentially on another PCPU. It seems likely
> > > that you'd want one in context_switch_to for the complementary reasons,
> > 
> > Is the cache shared across multiple pcpus? Because if it isn't then we
> > need to flush the dcache in this case.
> 
> AIUI this is what marking accesses 'shareable' is for: to tell the caches
> to keep them coherent.  If not, we're equally in trouble for all Xen's
> internal data structures.

Right they are either snooped or shared, depending on the level and the
implementation. Either way it is fine for us I think.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-07-02 12:33     ` Ian Campbell
@ 2013-07-02 12:39       ` Sengul Thomas
  0 siblings, 0 replies; 17+ messages in thread
From: Sengul Thomas @ 2013-07-02 12:39 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, jaeyong.yoo, xen-devel

> Oh well, I think the switch to Inner-shareable mappings in Xen is likely
> to remove the issue and I think the justification for the dmb patch in
> (un)map_domain_page is pretty good. So hopefully this issue will just go
> away for good once we apply those. (and if not we can always
> reinvestigate).

Of course yes. If I ran into the similar bug, I will try the patches.

Jaeyong

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-07-02 12:14   ` Sengul Thomas
  2013-07-02 12:24     ` Sengul Thomas
@ 2013-07-02 12:33     ` Ian Campbell
  2013-07-02 12:39       ` Sengul Thomas
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2013-07-02 12:33 UTC (permalink / raw)
  To: Sengul Thomas; +Cc: Tim Deegan, jaeyong.yoo, xen-devel

On Tue, 2013-07-02 at 21:14 +0900, Sengul Thomas wrote:
> On Tue, Jul 2, 2013 at 6:38 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> > On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:
> >> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> >> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> >> > > > > We were also speculating that we probably want some DMBs in
> >> > > > > context_switch_{from,to} as well as at return_to_guest.
> >> > > >
> >> > > > Actually, I just learned ReOrder Buffer, and it looks like so.
> >> >
> >> > Does this patch help with the issue you are seeing?
> >>
> >> I tried the combinations and it does not work.
> >
> > As an experiment can you try the same patch with dsb instead of dmb
> > throughout?
> 
> I tried several ways to recreate the problem I was having, but I
> failed. I was having
> that problem with some printks place to place, and sadly, I don't
> remember the printk positions.
> I'm sorry.

Oh yes, I'd forgotten that you weren't able to reproduce any more.

Oh well, I think the switch to Inner-shareable mappings in Xen is likely
to remove the issue and I think the justification for the dmb patch in
(un)map_domain_page is pretty good. So hopefully this issue will just go
away for good once we apply those (and if not we can always
reinvestigate).

> 
> Jaeyong
> 
> >
> > Could you also try the patch from my series "xen: arm: map normal memory
> > as inner shareable, reduce scope of various barriers" in
> > <1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM
> > they reminded me that Xen currently uses Outer-shareable mappings while
> > Linux uses Inner-, which is more than likely the root cause of the issue
> > you are seeing. You probably want the dmb() patch as well to avoid
> > speculative fetches.
> >
> > Ian.
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xen.org
> > http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-07-02 12:14   ` Sengul Thomas
@ 2013-07-02 12:24     ` Sengul Thomas
  2013-07-02 12:33     ` Ian Campbell
  1 sibling, 0 replies; 17+ messages in thread
From: Sengul Thomas @ 2013-07-02 12:24 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, jaeyong.yoo, xen-devel

> I tried several ways to recreate the problem I was having, but I
> failed. I was having
> that problem with some printks place to place, and sadly, I don't
> remember the printk positions.
> I'm sorry.
>
> Jaeyong

BTW, thomas.sengul@gmail.com is my personal email and
jaeyong.yoo@samsung.com is my work-email.
For posting patches, I use my work-email.
It is quite confusing. Sorry :)

Jaeyong/Thomas

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-07-02  9:38 ` Ian Campbell
@ 2013-07-02 12:14   ` Sengul Thomas
  2013-07-02 12:24     ` Sengul Thomas
  2013-07-02 12:33     ` Ian Campbell
  0 siblings, 2 replies; 17+ messages in thread
From: Sengul Thomas @ 2013-07-02 12:14 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, jaeyong.yoo, xen-devel

On Tue, Jul 2, 2013 at 6:38 PM, Ian Campbell <Ian.Campbell@citrix.com> wrote:
> On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:
>> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
>> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
>> > > > > We were also speculating that we probably want some DMBs in
>> > > > > context_switch_{from,to} as well as at return_to_guest.
>> > > >
>> > > > Actually, I just learned ReOrder Buffer, and it looks like so.
>> >
>> > Does this patch help with the issue you are seeing?
>>
>> I tried the combinations and it does not work.
>
> As an experiment can you try the same patch with dsb instead of dmb
> throughout?

I tried several ways to recreate the problem I was having, but I
failed. I was having
that problem with some printks place to place, and sadly, I don't
remember the printk positions.
I'm sorry.

Jaeyong

>
> Could you also try the patch from my series "xen: arm: map normal memory
> as inner shareable, reduce scope of various barriers" in
> <1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM
> they reminded me that Xen currently uses Outer-shareable mappings while
> Linux uses Inner-, which is more than likely the root cause of the issue
> you are seeing. You probably want the dmb() patch as well to avoid
> speculative fetches.
>
> Ian.
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-20  8:34 Jaeyong Yoo
  2013-06-25  9:22 ` Ian Campbell
@ 2013-07-02  9:38 ` Ian Campbell
  2013-07-02 12:14   ` Sengul Thomas
  1 sibling, 1 reply; 17+ messages in thread
From: Ian Campbell @ 2013-07-02  9:38 UTC (permalink / raw)
  To: jaeyong.yoo; +Cc: Tim Deegan, xen-devel

On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:
> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > > We were also speculating that we probably want some DMBs in
> > > > > context_switch_{from,to} as well as at return_to_guest.
> > > > 
> > > > Actually, I just learned ReOrder Buffer, and it looks like so.
> > 
> > Does this patch help with the issue you are seeing?
> 
> I tried the combinations and it does not work.

As an experiment can you try the same patch with dsb instead of dmb
throughout?

Could you also try the patch from my series "xen: arm: map normal memory
as inner shareable, reduce scope of various barriers" in
<1372435809.8976.169.camel@zakaz.uk.xensource.com>, speaking with ARM
they reminded me that Xen currently uses Outer-shareable mappings while
Linux uses Inner-, which is more than likely the root cause of the issue
you are seeing. You probably want the dmb() patch as well to avoid
speculative fetches.

Ian.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-20  8:34 Jaeyong Yoo
@ 2013-06-25  9:22 ` Ian Campbell
  2013-07-02  9:38 ` Ian Campbell
  1 sibling, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2013-06-25  9:22 UTC (permalink / raw)
  To: jaeyong.yoo; +Cc: Tim Deegan, xen-devel

On Thu, 2013-06-20 at 08:34 +0000, Jaeyong Yoo wrote:
> > On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> > > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > > We were also speculating that we probably want some DMBs in
> > > > > context_switch_{from,to} as well as at return_to_guest.
> > > > 
> > > > Actually, I just learned ReOrder Buffer, and it looks like so.
> > 
> > Does this patch help with the issue you are seeing?
> 
> I tried the combinations and it does not work.

Thanks, this is useful to know. It may be that our analysis is simply
flawed and something else it at work in both cases.

>  I think my problem maybe stem 
> from a different reason. Since this problem happens while
> we try to migrate domU, something really weird may happen.
> 
> Actually, one of my colleage told me that this problem I'm having has been 
> magically disappeared while he tried with copying more vcpu registers and 
> lots of printks places to places.

printks change all sorts of things, like timing, their own use of
barriers (e.g. in the serial driver) and simply adding more code which
increases the gap in the instruction stream between memory accesses
which might require a barrier such that the unwanted ordering no longer
happens.

>  At this moment, I'm not sure that this is the 
> common problem in xen or the problem due to poor migration, but I'm more 
> believing that maybe it is due to the poor migration. Since I'm keep investigating
> tihs issue, I will tell you if anything turns out.

Thanks.

> 
> > I think in theory it should *BUT* (and it's a big But)...
> > 
> > ... it doesn't work for me on an issue I've been seeing booting dom0 on
> > the Foundation model. However doing the dmb in map_domain_page() *twice*
> 
> BTW, did you put barriers in map_domain_page? 
> since the patch below looks like in unmap_domain_page.

The first hunk is map_domain_page.

> > does work for me. In fact doing a dmb+nop works too. This is
> > inexplicable to me. It may be a model bug or it may (more likely) be
> > indicative of some other underlying issue. I also tried dsb() instead of
> > dmb() and that doesn't make a difference.
> > 
> > Anyway, I'm interested about its behaviour on real hardware.
> > 
> > These are doing full system barriers. I think in reality only a "dmb
> > nsh" should be needed for this use case, since we only care about the
> > local processor. I didn't want to go there when the supposedly more
> > obvious case wasn't doing as I expected!

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
@ 2013-06-20  8:34 Jaeyong Yoo
  2013-06-25  9:22 ` Ian Campbell
  2013-07-02  9:38 ` Ian Campbell
  0 siblings, 2 replies; 17+ messages in thread
From: Jaeyong Yoo @ 2013-06-20  8:34 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, xen-devel

> On Tue, 2013-06-18 at 13:18 +0100, Ian Campbell wrote:
> > On Tue, 2013-06-18 at 12:05 +0000, Jaeyong Yoo wrote:
> > > > We were also speculating that we probably want some DMBs in
> > > > context_switch_{from,to} as well as at return_to_guest.
> > > 
> > > Actually, I just learned ReOrder Buffer, and it looks like so.
> 
> Does this patch help with the issue you are seeing?

I tried the combinations and it does not work. I think my problem maybe stem 
from a different reason. Since this problem happens while
we try to migrate domU, something really weird may happen.

Actually, one of my colleage told me that this problem I'm having has been 
magically disappeared while he tried with copying more vcpu registers and 
lots of printks places to places. At this moment, I'm not sure that this is the 
common problem in xen or the problem due to poor migration, but I'm more 
believing that maybe it is due to the poor migration. Since I'm keep investigating
tihs issue, I will tell you if anything turns out.

> I think in theory it should *BUT* (and it's a big But)...
> 
> ... it doesn't work for me on an issue I've been seeing booting dom0 on
> the Foundation model. However doing the dmb in map_domain_page() *twice*

BTW, did you put barriers in map_domain_page? 
since the patch below looks like in unmap_domain_page.

> does work for me. In fact doing a dmb+nop works too. This is
> inexplicable to me. It may be a model bug or it may (more likely) be
> indicative of some other underlying issue. I also tried dsb() instead of
> dmb() and that doesn't make a difference.
> 
> Anyway, I'm interested about its behaviour on real hardware.
> 
> These are doing full system barriers. I think in reality only a "dmb
> nsh" should be needed for this use case, since we only care about the
> local processor. I didn't want to go there when the supposedly more
> obvious case wasn't doing as I expected!
> 
> Ian.
> 
> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c
> index beae61e..ef67953 100644
> --- a/xen/arch/arm/mm.c
> +++ b/xen/arch/arm/mm.c
> @@ -218,6 +218,13 @@ void *map_domain_page(unsigned long mfn)
> 
>      local_irq_save(flags);
> 
> +    /*
> +     * Ensure that any previous writes which occurred via another
> +     * mapping, specifically the guest's own mapping have been
> +     * completed such that they are visible via this mapping.
> +     */
> +    dmb();
> +
>      /* The map is laid out as an open-addressed hash table where each
>       * entry is a 2MB superpage pte.  We use the available bits of each
>       * PTE as a reference count; when the refcount is zero the slot can
> @@ -286,6 +297,13 @@ void unmap_domain_page(const void *va)
> 
>      map[slot].pt.avail--;
> 
> +    /*
> +     * Ensure that any writes which occurred via this mapping have been
> +     * completed such that they are visible via other virtual
> +     * mappings, specifically the guest's own mapping.
> +     */
> +    dmb();
> +
>      local_irq_restore(flags);
> }
> 
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-18 11:22 Jaeyong Yoo
@ 2013-06-18 11:45 ` Ian Campbell
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2013-06-18 11:45 UTC (permalink / raw)
  To: jaeyong.yoo; +Cc: Tim Deegan, xen-devel

On Tue, 2013-06-18 at 11:22 +0000, Jaeyong Yoo wrote:
> > 
> > So I think we probably actually need the dcache flush in domain_map_page
> > at the "/* Commandeer this 2MB slot */" point. In that context I don't
> > think we can avoid flushing anything other than the complete 2MB
> > mapping. Does this work for you too?
> 
> I am not sure that this would work. If we map_domain_page and unmap_domain_page
> with the same mfn over and over again while the ref count is not zero (say 5), 
> then flush is not called. And, I think we should call flush according to the reason below:
> 
> > 
> > The laziness of the remappings makes me wonder though. Do you know if
> > the slot is reused between step #2 and #3?  Otherwise I'd expect us to
> > reuse the existing mapping with the cache intact. The caches are PIPT so
> > I wouldn't expect the address aliasing to be an issue. Unless the
> > mapping is reused for something else I'm not too sure where the cache
> > pollution is coming from.
> 
> Let me try explain in more detail. 
> We can consider the DomU as a producer (writing values to mfn) and
> hypervisor as a consumer (reading values from mfn). While DomU is 
> invoking multiple hypercalls, it reuses the same mfn and the same 
> mapping at xen page table.
> 
>        (consumer)             (producer)
>            xen                   DomU
>              \                    /   (writing path)
>             (cache)              /
>                  \              /
> (reading path)    \           /
> _______________________________________
>                     |   mfn   |        (physical memory)
> ---------------------------------------
> 
> If we see the above figure, xen "may" keep reading the cached value while 
> DomU is writing different values to mfn.

But all of the caches on this platform are PIPT (right?) so isn't it
actually:

       (consumer)             (producer)
            xen                   DomU
              \                 /   (writing path)
               \               /
                \             /
 (reading path)  \           /
                  \         /
                    (cache)
                       ||
                       ||
                       \/
 _______________________________________
                     |   mfn   |        (physical memory)
 ---------------------------------------
 

Or are you saying that the writing path is uncached?

I was chatting with Tim and he suggested that the issue might be the
ReOrder Buffer, which is virtually tagged. In that case a DMB ought to
be sufficient and not a full cache flush, we think.

We were also speculating that we probably want some DMBs in
context_switch_{from,to} as well as at return_to_guest.

>  Here goes my observation where
> cache pollution happen:
> 
> The pollution actually happens in second line of cache.
> DomU side hypercall param local address is 0xc785fe38 (cache line size = 0x40B),
> and the size of hypercall param is 24B. So the hypercall param lays out in two
> cache lines. When hypervisor is reading the hypercall param, it reads the first 
> 8 bytes correctly (means the first cache line is flushed) and the other 16 bytes
> are polluted (means the second cache line is not flushed).
> Honestly, I'm not sure why the first cache line is flushed and the second is not.
> I think we can also cache_line_align the hypercall param struct, but that is only
> when  the sizes of all hypercall params are smaller than cache line size.
> 
> I hope the alignment of the figure is not broken :)
> 
> Best,
> Jaeyong

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
@ 2013-06-18 11:22 Jaeyong Yoo
  2013-06-18 11:45 ` Ian Campbell
  0 siblings, 1 reply; 17+ messages in thread
From: Jaeyong Yoo @ 2013-06-18 11:22 UTC (permalink / raw)
  To: Ian Campbell; +Cc: Tim Deegan, xen-devel

> 
> So I think we probably actually need the dcache flush in domain_map_page
> at the "/* Commandeer this 2MB slot */" point. In that context I don't
> think we can avoid flushing anything other than the complete 2MB
> mapping. Does this work for you too?

I am not sure that this would work. If we map_domain_page and unmap_domain_page
with the same mfn over and over again while the ref count is not zero (say 5), 
then flush is not called. And, I think we should call flush according to the reason below:

> 
> The laziness of the remappings makes me wonder though. Do you know if
> the slot is reused between step #2 and #3?  Otherwise I'd expect us to
> reuse the existing mapping with the cache intact. The caches are PIPT so
> I wouldn't expect the address aliasing to be an issue. Unless the
> mapping is reused for something else I'm not too sure where the cache
> pollution is coming from.

Let me try explain in more detail. 
We can consider the DomU as a producer (writing values to mfn) and
hypervisor as a consumer (reading values from mfn). While DomU is 
invoking multiple hypercalls, it reuses the same mfn and the same 
mapping at xen page table.

       (consumer)             (producer)
           xen                   DomU
             \                    /   (writing path)
            (cache)              /
                 \              /
(reading path)    \           /
_______________________________________
                    |   mfn   |        (physical memory)
---------------------------------------

If we see the above figure, xen "may" keep reading the cached value while 
DomU is writing different values to mfn. Here goes my observation where
cache pollution happen:

The pollution actually happens in second line of cache.
DomU side hypercall param local address is 0xc785fe38 (cache line size = 0x40B),
and the size of hypercall param is 24B. So the hypercall param lays out in two
cache lines. When hypervisor is reading the hypercall param, it reads the first 
8 bytes correctly (means the first cache line is flushed) and the other 16 bytes
are polluted (means the second cache line is not flushed).
Honestly, I'm not sure why the first cache line is flushed and the second is not.
I think we can also cache_line_align the hypercall param struct, but that is only
when  the sizes of all hypercall params are smaller than cache line size.

I hope the alignment of the figure is not broken :)

Best,
Jaeyong

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] ARM: cache coherence problem in guestcopy.c
  2013-06-18  5:03 Jaeyong Yoo
@ 2013-06-18  9:20 ` Ian Campbell
  0 siblings, 0 replies; 17+ messages in thread
From: Ian Campbell @ 2013-06-18  9:20 UTC (permalink / raw)
  To: Jaeyong Yoo; +Cc: Tim Deegan, xen-devel

On Tue, 2013-06-18 at 14:03 +0900, Jaeyong Yoo wrote:
> I've encountered a rather unusual bug while I'm implementing live 
> migration on arndale board.
> After resume, domU kernel starts invoking hypercalls and at some point
> the hypercall parameters delivered to xen are corrupted.
> 
> After some debugging (with the help of HW debugger), I found that
> cache polution happens, and here is the detailed sequence.
> 
> 1) DomU kernel allocates a local variable struct xen_add_to_physmap xatp
>    and the GVA of xatp is 0xc785fe38 (note that not cache-line aligned)
>    (see gnttab_map function in linux/drivers/xen/grant-table.c)
>    
> 2) GVA of xatp is mapped in xen page table at raw_copy_from_guest function, 
>    and the VA of xen is 0xae48ee38 and its contents are cached.
> 
> 3) DomU kernel reuses xatp to invoke the second hypercall with different 
>    parameters.
> 
> 4) GVA of xatp is mapped again in the same VA of xen, and the cached data
>    at step 2) (the first hypercall parameter) is loaded.
> 
> The below patch prevents the above problem.

Ouch! I wonder what other random unexplained events could be attributed
to this!

> I'm wondering, as a better solution, that does unmap_domain_page should 
> invalidate the cache before unmap the page?

I think that would be better since the domain_page region mappings are
reference counted so you could avoid a flush every time you unmapped
something and just issue it when the count hits zero.

But actually the domain_page stuff is lazy -- it won't unmap a
refcount==0 slot until it comes to reuse it, this means that if you
unmap then remap the same address you should end up reusing the same
mapping. This means we only need to remap on reuse and not when the
count hits zero. That's why there are TLB flushes in map_domain_page but
not unmap_domain_page. Obviously as you have discovered we need some
data cache flushes in there as well.

So I think we probably actually need the dcache flush in domain_map_page
at the "/* Commandeer this 2MB slot */" point. In that context I don't
think we can avoid flushing anything other than the complete 2MB
mapping. Does this work for you too?

The laziness of the remappings makes me wonder though. Do you know if
the slot is reused between step #2 and #3? Otherwise I'd expect us to
reuse the existing mapping with the cache intact. The caches are PIPT so
I wouldn't expect the address aliasing to be an issue. Unless the
mapping is reused for something else I'm not too sure where the cache
pollution is coming from.

Ian.

> 
> 
> Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
> ---
>  xen/arch/arm/guestcopy.c |    3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
> index d146cd6..9c4a7e4 100644
> --- a/xen/arch/arm/guestcopy.c
> +++ b/xen/arch/arm/guestcopy.c
> @@ -25,6 +25,7 @@ unsigned long raw_copy_to_guest(void *to, const void *from, unsigned len)
>          p += offset;
>          memcpy(p, from, size);
>  
> +        flush_xen_dcache_va_range(p, size);
>          unmap_domain_page(p - offset);
>          len -= size;
>          from += size;
> @@ -55,6 +56,7 @@ unsigned long raw_clear_guest(void *to, unsigned len)
>          p += offset;
>          memset(p, 0x00, size);
>  
> +        flush_xen_dcache_va_range(p, size);
>          unmap_domain_page(p - offset);
>          len -= size;
>          to += size;
> @@ -84,6 +86,7 @@ unsigned long raw_copy_from_guest(void *to, const void __user *from, unsigned le
>  
>          memcpy(to, p, size);
>  
> +        flush_xen_dcache_va_range(p, size);
>          unmap_domain_page(p);
>          len -= size;
>          from += size;

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] ARM: cache coherence problem in guestcopy.c
@ 2013-06-18  5:03 Jaeyong Yoo
  2013-06-18  9:20 ` Ian Campbell
  0 siblings, 1 reply; 17+ messages in thread
From: Jaeyong Yoo @ 2013-06-18  5:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Jaeyong Yoo

I've encountered a rather unusual bug while I'm implementing live 
migration on arndale board.
After resume, domU kernel starts invoking hypercalls and at some point
the hypercall parameters delivered to xen are corrupted.

After some debugging (with the help of HW debugger), I found that
cache polution happens, and here is the detailed sequence.

1) DomU kernel allocates a local variable struct xen_add_to_physmap xatp
   and the GVA of xatp is 0xc785fe38 (note that not cache-line aligned)
   (see gnttab_map function in linux/drivers/xen/grant-table.c)
   
2) GVA of xatp is mapped in xen page table at raw_copy_from_guest function, 
   and the VA of xen is 0xae48ee38 and its contents are cached.

3) DomU kernel reuses xatp to invoke the second hypercall with different 
   parameters.

4) GVA of xatp is mapped again in the same VA of xen, and the cached data
   at step 2) (the first hypercall parameter) is loaded.

The below patch prevents the above problem.

I'm wondering, as a better solution, that does unmap_domain_page should 
invalidate the cache before unmap the page?


Signed-off-by: Jaeyong Yoo <jaeyong.yoo@samsung.com>
---
 xen/arch/arm/guestcopy.c |    3 +++
 1 file changed, 3 insertions(+)

diff --git a/xen/arch/arm/guestcopy.c b/xen/arch/arm/guestcopy.c
index d146cd6..9c4a7e4 100644
--- a/xen/arch/arm/guestcopy.c
+++ b/xen/arch/arm/guestcopy.c
@@ -25,6 +25,7 @@ unsigned long raw_copy_to_guest(void *to, const void *from, unsigned len)
         p += offset;
         memcpy(p, from, size);
 
+        flush_xen_dcache_va_range(p, size);
         unmap_domain_page(p - offset);
         len -= size;
         from += size;
@@ -55,6 +56,7 @@ unsigned long raw_clear_guest(void *to, unsigned len)
         p += offset;
         memset(p, 0x00, size);
 
+        flush_xen_dcache_va_range(p, size);
         unmap_domain_page(p - offset);
         len -= size;
         to += size;
@@ -84,6 +86,7 @@ unsigned long raw_copy_from_guest(void *to, const void __user *from, unsigned le
 
         memcpy(to, p, size);
 
+        flush_xen_dcache_va_range(p, size);
         unmap_domain_page(p);
         len -= size;
         from += size;
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-07-02 12:39 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-06-18 12:05 [PATCH] ARM: cache coherence problem in guestcopy.c Jaeyong Yoo
2013-06-18 12:18 ` Ian Campbell
2013-06-19 15:12   ` Ian Campbell
2013-06-20 11:55   ` Stefano Stabellini
2013-06-20 12:19     ` Tim Deegan
2013-06-25  9:43       ` Ian Campbell
  -- strict thread matches above, loose matches on Subject: below --
2013-06-20  8:34 Jaeyong Yoo
2013-06-25  9:22 ` Ian Campbell
2013-07-02  9:38 ` Ian Campbell
2013-07-02 12:14   ` Sengul Thomas
2013-07-02 12:24     ` Sengul Thomas
2013-07-02 12:33     ` Ian Campbell
2013-07-02 12:39       ` Sengul Thomas
2013-06-18 11:22 Jaeyong Yoo
2013-06-18 11:45 ` Ian Campbell
2013-06-18  5:03 Jaeyong Yoo
2013-06-18  9:20 ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.