xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed
* 2nd opinion on backportability of c35eefded2
@ 2016-05-13  9:32 Jan Beulich
  2016-05-18 14:41 ` George Dunlap
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2016-05-13  9:32 UTC (permalink / raw)
  To: George Dunlap; +Cc: xen-devel

Hi George,

after quite a bit of debugging on 4.6.1 I learned that said commit
("x86/P2M: consolidate handling of types not requiring a valid MFN")
is more than just cleanup: Since p2m_set_entry() happily performs
arithmetic on the passed in MFN, shadow mode guests (verified) as
well as HAP ones when 1Gb / 2Mb mappings are unavailable (not
verified), if any of the MFNs below 1Gb are invalid (reported with
e.g. "crashkernel=512M@16M"), p2m_pt_set_entry() would (in
the context of guest_physmap_mark_populate_on_demand())
produce invalid instead of PoD entries, resulting in subsequent
attempts by the guest to use (e.g. balloon out) these pages to
fail (the balloon failure results in a crash during boot).

Now, while the backport to 4.6 is straightforward, I'm having the
vague feeling that this change might depend on some earlier one,
but I can't pinpoint anything. Hence I would appreciate if you
could take a look and provide your judgment.

Thanks, Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2nd opinion on backportability of c35eefded2
  2016-05-13  9:32 2nd opinion on backportability of c35eefded2 Jan Beulich
@ 2016-05-18 14:41 ` George Dunlap
  2016-05-18 15:23   ` Jan Beulich
  0 siblings, 1 reply; 3+ messages in thread
From: George Dunlap @ 2016-05-18 14:41 UTC (permalink / raw)
  To: Jan Beulich, George Dunlap; +Cc: xen-devel

On 13/05/16 10:32, Jan Beulich wrote:
> Hi George,
> 
> after quite a bit of debugging on 4.6.1 I learned that said commit
> ("x86/P2M: consolidate handling of types not requiring a valid MFN")
> is more than just cleanup: Since p2m_set_entry() happily performs
> arithmetic on the passed in MFN, shadow mode guests (verified) as
> well as HAP ones when 1Gb / 2Mb mappings are unavailable (not
> verified), if any of the MFNs below 1Gb are invalid (reported with
> e.g. "crashkernel=512M@16M"), p2m_pt_set_entry() would (in
> the context of guest_physmap_mark_populate_on_demand())
> produce invalid instead of PoD entries, resulting in subsequent
> attempts by the guest to use (e.g. balloon out) these pages to
> fail (the balloon failure results in a crash during boot).

I'm sorry, I'm having some trouble parsing this paragraph. :-)

But this is what I gather:

Before c35eefded2, the 4k-entry path didn't check whether the p2m entry
was PoD, it only checked that the mfn was valid.  The pod code would
always set this mfn to 0, which is (usually) valid; but p2m_set_entry()
might iterate over this, ending up in p2m_pt_set_entry() with non-zero
low values for mfn.  In certain circumstances these low-value mfns might
actually be invalid (for example, if they overlap a crash kernel).

In such a case, the resulting p2m entries would be (erroneously) marked
as invalid rather than PoD, resulting in a crash when the guest tried to
access it or populate it.

You observed this happening in the shadow case, and believe it would
happen if p2m 2MiB or 1GiB pages were disabled but haven't checked those
cases.

c35eefded2 fixes this by implicitly checking for the PoD type on the 4k
entry path.

> Now, while the backport to 4.6 is straightforward, I'm having the
> vague feeling that this change might depend on some earlier one,
> but I can't pinpoint anything. Hence I would appreciate if you
> could take a look and provide your judgment.

We talked about the PoD type thing in the context of a discussion about
660fd65 ("x86/p2m-pt: tighten conditions of IOMMU mapping updates") --
discussion here [1].  But it doesn't look like that's a prerequisite,
and nothing else really comes to mind.

 -George

[1] marc.info/?i=<560D261D02000078000A760A@prv-mh.provo.novell.com>



_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: 2nd opinion on backportability of c35eefded2
  2016-05-18 14:41 ` George Dunlap
@ 2016-05-18 15:23   ` Jan Beulich
  0 siblings, 0 replies; 3+ messages in thread
From: Jan Beulich @ 2016-05-18 15:23 UTC (permalink / raw)
  To: George Dunlap, George Dunlap; +Cc: xen-devel

>>> On 18.05.16 at 16:41, <george.dunlap@citrix.com> wrote:
> On 13/05/16 10:32, Jan Beulich wrote:
>> Hi George,
>> 
>> after quite a bit of debugging on 4.6.1 I learned that said commit
>> ("x86/P2M: consolidate handling of types not requiring a valid MFN")
>> is more than just cleanup: Since p2m_set_entry() happily performs
>> arithmetic on the passed in MFN, shadow mode guests (verified) as
>> well as HAP ones when 1Gb / 2Mb mappings are unavailable (not
>> verified), if any of the MFNs below 1Gb are invalid (reported with
>> e.g. "crashkernel=512M@16M"), p2m_pt_set_entry() would (in
>> the context of guest_physmap_mark_populate_on_demand())
>> produce invalid instead of PoD entries, resulting in subsequent
>> attempts by the guest to use (e.g. balloon out) these pages to
>> fail (the balloon failure results in a crash during boot).
> 
> I'm sorry, I'm having some trouble parsing this paragraph. :-)
> 
> But this is what I gather:
> 
> Before c35eefded2, the 4k-entry path didn't check whether the p2m entry
> was PoD, it only checked that the mfn was valid.  The pod code would
> always set this mfn to 0, which is (usually) valid; but p2m_set_entry()
> might iterate over this, ending up in p2m_pt_set_entry() with non-zero
> low values for mfn.  In certain circumstances these low-value mfns might
> actually be invalid (for example, if they overlap a crash kernel).
> 
> In such a case, the resulting p2m entries would be (erroneously) marked
> as invalid rather than PoD, resulting in a crash when the guest tried to
> access it or populate it.
> 
> You observed this happening in the shadow case, and believe it would
> happen if p2m 2MiB or 1GiB pages were disabled but haven't checked those
> cases.
> 
> c35eefded2 fixes this by implicitly checking for the PoD type on the 4k
> entry path.

Yes, that's a neater re-write of what I had tried to state.

>> Now, while the backport to 4.6 is straightforward, I'm having the
>> vague feeling that this change might depend on some earlier one,
>> but I can't pinpoint anything. Hence I would appreciate if you
>> could take a look and provide your judgment.
> 
> We talked about the PoD type thing in the context of a discussion about
> 660fd65 ("x86/p2m-pt: tighten conditions of IOMMU mapping updates") --
> discussion here [1].  But it doesn't look like that's a prerequisite,
> and nothing else really comes to mind.

Right, the PoD thing was just an observation while putting together
that other patch, not something that would create a dependency.

Plus - that commit went in during 4.6-rc, and got backported to 4.5.x,
so even if it was a prereq this would not be a problem.

Thanks for double checking!

Jan


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2016-05-18 15:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-13  9:32 2nd opinion on backportability of c35eefded2 Jan Beulich
2016-05-18 14:41 ` George Dunlap
2016-05-18 15:23   ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).