All of lore.kernel.org
 help / color / mirror / Atom feed
* Possible bug/question in xen-hptool?
@ 2015-04-23  4:03 Meng Xu
  2015-04-23  8:14 ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Meng Xu @ 2015-04-23  4:03 UTC (permalink / raw)
  To: xen-devel; +Cc: Andrew Cooper, Keir Fraser


[-- Attachment #1.1: Type: text/plain, Size: 3724 bytes --]

Hi,

I was looking at using xen-hptool (tool/misc/xen-hptool.c) to make one page
of a guest domain offline.

I created a guest domain on Xen unstable:​
# xen-mfndump dump-p2m 1
I have dom1's mfn of pfn (0x1d):
pfn=0x1d ==> mfn=0x14ee17 (type 0x0)

​Run `lookup-pte` to find the mfn of the pte of mfn (0x14ee17)​:
# xen-mfndump lookup-pte 1 0x14ee17
 --- Lookig for PTEs mapping mfn 0x14ee17 for domain 1 ---
 Guest Width: 8, PT Levels: 4 P2M size: = 262144
  0x14ee17 <-- [0xd948e][29]: 0x1000014ee17027

​Now I use xen-hptool to make mfn (0x14ee17) offline​:
# xen-hptool mem-offline 0x14ee17
Prepare to offline MEMORY mfn 14ee17
DOM1: No suspend port, try live migration
Failed to suspend guest 1 for mfn 14ee17
​(Comment: I modified the code to bypass the suspension of the dom1. I
should use libxl to suspend dom1 or use the event channel to notify dom1 to
suspend as the original code does. But this is not the question/issue I'm
talking about here right now and I don't think this will affect the
following discussion/conclusion.)​
xc: error: Failure when submitting mmu updates: Internal error
xc: error: clear pte failed: Internal error
Memory mfn 14ee17 offlined successfully , this page is DOM1 page yet failed
to be exchanged. current state is [PG_OFFLINE_PENDING, PG_OFFLINE_OWNED]
(XEN) mm.c:2004:d0v0 Error pfn d948e: rd=ffff83015d446000,
od=ffff83017d8d0000
​​
, caf=8000000000000004, taf=1400000000000002
(XEN) mm.c:3544:d0v0 Could not get page for normal update

​I looked into the do_mmu_update() @ xen/arch/x86/mm.c, the reason why this
mmu_update fails is because the owner of the page table of mfn (0x14ee17),
denoted as pt_dom, is domain 0, while the owner of the page of mfn
(0x14ee17) is domain 1 in do_mmu_update().

After digging into it, I found the following code confused/suspicious:

Inside do_mmu_update() @ xen/arch/x86/mm.c,
pt_dom is assigned by the this line:   if ( (pt_dom = foreigndom >> 16 ) !=
0 ) .
However, in flush_mmu_updates() @ tools/libxc/xc_private.c, the foreigndom
is assigned by the following line: hypercall.arg[3] = mmu->subject; where
mmu->subject is the guest domain id of the page table.

The first question is:
Why should we use "foreigndom >> 16" instead of "foreigndom" to get the
pt_dom?
(When a page is marked offline, we can get the domid of the page via
status, using status >> PG_OFFLINE_OWNER_SHIFT. But why should we left
shift 16 bits again in do_mmu_update?)
(I think this explains why pt_owner is treated as 0 because pt_owner was
just using the default value which is the domain of current vcpu that runs
the hypercall.)

pt_owner is retrieved by the following line :
if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
My second question is:
Why should we use "pt_dom - 1" instead of  "pt_dom" here?

If I set the old foreigndom (1) as (foreigndom << 16 | foreigndom) and pass
the new foreigndom as the last parameter of do_mmu_update(), and change
"pt_dom - 1" to "pt_dom", the xen-hptool will successfully make the mfn
offline. Here is the output after issuing the command:Memory
mfn 0x14ee17 offlined successfully, this page is DOM1 page and being
swapped successfully, current state is [PG_OFFLINE_OFFLINED,
PG_OFFLINE_OWNED]

I'm wondering if this is a bug in do_mmu_update() or  at least some
inconsistence is in the do_mmu_update() code?
Of course, this could also be because I misunderstood something. If so,
could you please let me know what I misunderstood and how I should correct
it?

Thank you very much for your time!

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

[-- Attachment #1.2: Type: text/html, Size: 6443 bytes --]

[-- Attachment #2: Type: text/plain, Size: 126 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Possible bug/question in xen-hptool?
  2015-04-23  4:03 Possible bug/question in xen-hptool? Meng Xu
@ 2015-04-23  8:14 ` Jan Beulich
  2015-04-23 15:54   ` Meng Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Jan Beulich @ 2015-04-23  8:14 UTC (permalink / raw)
  To: Meng Xu, xen-devel; +Cc: Andrew Cooper, Keir Fraser

>>> On 23.04.15 at 06:03, <xumengpanda@gmail.com> wrote:
> The first question is:
> Why should we use "foreigndom >> 16" instead of "foreigndom" to get the
> pt_dom?

Because there are possibly three domains involved here: The current
one (issuing the hypercall), the one owning the page, and the one
owning the page table.

> pt_owner is retrieved by the following line :
> if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
> My second question is:
> Why should we use "pt_dom - 1" instead of  "pt_dom" here?

Because this consideration of three involved domains was made only
after the interface was there, and hence the adjustment needed to
be made such that zeros in the upper 16 bits would mean "no
override", not "domain 0". See the description of this mechanism in
public/xen.h.

> If I set the old foreigndom (1) as (foreigndom << 16 | foreigndom) and pass
> the new foreigndom as the last parameter of do_mmu_update(), and change
> "pt_dom - 1" to "pt_dom", the xen-hptool will successfully make the mfn
> offline. Here is the output after issuing the command:Memory
> mfn 0x14ee17 offlined successfully, this page is DOM1 page and being
> swapped successfully, current state is [PG_OFFLINE_OFFLINED,
> PG_OFFLINE_OWNED]
> 
> I'm wondering if this is a bug in do_mmu_update() or  at least some
> inconsistence is in the do_mmu_update() code?

No, the above finding rather appears to indicate that the tool didn't
get updated when the hypercall extension was done. I.e. if the
page tables modified belong to the domain owning the page (and
not the domain doing the hypercall) the invocation would need to be
changed.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Possible bug/question in xen-hptool?
  2015-04-23  8:14 ` Jan Beulich
@ 2015-04-23 15:54   ` Meng Xu
  2015-04-23 16:14     ` Jan Beulich
  0 siblings, 1 reply; 4+ messages in thread
From: Meng Xu @ 2015-04-23 15:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Cooper, Keir Fraser, xen-devel

Hi Jan,

2015-04-23 4:14 GMT-04:00 Jan Beulich <JBeulich@suse.com>:
>
> >>> On 23.04.15 at 06:03, <xumengpanda@gmail.com> wrote:
> > The first question is:
> > Why should we use "foreigndom >> 16" instead of "foreigndom" to get the
> > pt_dom?
>
> Because there are possibly three domains involved here: The current
> one (issuing the hypercall), the one owning the page, and the one
> owning the page table.
>
> > pt_owner is retrieved by the following line :
> > if ( (pt_owner = rcu_lock_domain_by_id(pt_dom - 1)) == NULL )
> > My second question is:
> > Why should we use "pt_dom - 1" instead of  "pt_dom" here?
>
> Because this consideration of three involved domains was made only
> after the interface was there, and hence the adjustment needed to
> be made such that zeros in the upper 16 bits would mean "no
> override", not "domain 0". See the description of this mechanism in
> public/xen.h.

I see. :-)

>
> > If I set the old foreigndom (1) as (foreigndom << 16 | foreigndom) and pass
> > the new foreigndom as the last parameter of do_mmu_update(), and change
> > "pt_dom - 1" to "pt_dom", the xen-hptool will successfully make the mfn
> > offline. Here is the output after issuing the command:Memory
> > mfn 0x14ee17 offlined successfully, this page is DOM1 page and being
> > swapped successfully, current state is [PG_OFFLINE_OFFLINED,
> > PG_OFFLINE_OWNED]
> >
> > I'm wondering if this is a bug in do_mmu_update() or  at least some
> > inconsistence is in the do_mmu_update() code?
>
> No, the above finding rather appears to indicate that the tool didn't
> get updated when the hypercall extension was done. I.e. if the
> page tables modified belong to the domain owning the page (and
> not the domain doing the hypercall) the invocation would need to be
> changed.

Right! Now I understand the whole story. :-)

Right now, `xen-hptool mem-offline <mfn>` does not take the <domid>,
to which the <mfn> belongs. I think this tool should take the <domid>
as an extra parameter just as `xen-mfndump lookup-pte <domid> <mfn>`
does. When the tool construct the foreigndom for mmu, it should make
the foreigndom as ((pt_owner_id + 1 << 16) | pg_owner_id).

Am I correct?

Thank you so much for your insightful explanation! It really helps!

Best regards,

Meng


-----------
Meng Xu
PhD Student in Computer and Information Science
University of Pennsylvania

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Possible bug/question in xen-hptool?
  2015-04-23 15:54   ` Meng Xu
@ 2015-04-23 16:14     ` Jan Beulich
  0 siblings, 0 replies; 4+ messages in thread
From: Jan Beulich @ 2015-04-23 16:14 UTC (permalink / raw)
  To: Meng Xu; +Cc: Andrew Cooper, Keir Fraser, xen-devel

>>> On 23.04.15 at 17:54, <xumengpanda@gmail.com> wrote:
> Right now, `xen-hptool mem-offline <mfn>` does not take the <domid>,
> to which the <mfn> belongs. I think this tool should take the <domid>
> as an extra parameter just as `xen-mfndump lookup-pte <domid> <mfn>`
> does. When the tool construct the foreigndom for mmu, it should make
> the foreigndom as ((pt_owner_id + 1 << 16) | pg_owner_id).
> 
> Am I correct?

I don't know; I'm not sure what the intentions of the tool are.

Jan

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-04-23 16:14 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-23  4:03 Possible bug/question in xen-hptool? Meng Xu
2015-04-23  8:14 ` Jan Beulich
2015-04-23 15:54   ` Meng Xu
2015-04-23 16:14     ` Jan Beulich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.