All of lore.kernel.org
 help / color / mirror / Atom feed
* kexec design issues
@ 2009-03-12 14:49 Jan Beulich
  2009-03-16  5:54 ` Simon Horman
  0 siblings, 1 reply; 6+ messages in thread
From: Jan Beulich @ 2009-03-12 14:49 UTC (permalink / raw)
  To: xen-devel

As I became aware of only now (by way of a bug report), Linux side kexec
changes on i386 silently broke assumptions in the kexec hypercall interface.
Therefore I'd like to get an understanding on what the general perspective
on the following problems is:

1) machine_kexec_load() silently assumes that page_list[] has alternating
(and paired) physical/virtual entries. This is no longer valid with 2.6.27, and
while the immediate fix is to re-arrange the list entries so that the newly
added entry would end up after all pairs (in 2.6.29 an extra gap will need
to be added), this points out that the interface definition itself is really
flawed.

2) KEXEC_XEN_NO_PAGES is set to the apparently arbitrary value of 17.
While a build error would clearly show any incompatible Linux side change
in this case, it still seems bogus to hard-code this Linux defined value into
the hypercall interface.

3) machine_kexec_load() blindly iterates over all page_list[] entries,
regardless of whether any of them is zero, and hence establishes
(currently on 32-bit only) numerous bogus page table entries mapping
mfn 0.

Thanks for any comments,
Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kexec design issues
  2009-03-12 14:49 kexec design issues Jan Beulich
@ 2009-03-16  5:54 ` Simon Horman
       [not found]   ` <aec7e5c30903160419g2d6a6386o7f17ec876078133c@mail.gmail.com>
  0 siblings, 1 reply; 6+ messages in thread
From: Simon Horman @ 2009-03-16  5:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Magnus Damm

On Thu, Mar 12, 2009 at 02:49:50PM +0000, Jan Beulich wrote:
> As I became aware of only now (by way of a bug report), Linux side kexec
> changes on i386 silently broke assumptions in the kexec hypercall interface.
> Therefore I'd like to get an understanding on what the general perspective
> on the following problems is:
> 
> 1) machine_kexec_load() silently assumes that page_list[] has alternating
> (and paired) physical/virtual entries. This is no longer valid with 2.6.27, and
> while the immediate fix is to re-arrange the list entries so that the newly
> added entry would end up after all pairs (in 2.6.29 an extra gap will need
> to be added), this points out that the interface definition itself is really
> flawed.
> 
> 2) KEXEC_XEN_NO_PAGES is set to the apparently arbitrary value of 17.
> While a build error would clearly show any incompatible Linux side change
> in this case, it still seems bogus to hard-code this Linux defined value into
> the hypercall interface.
> 
> 3) machine_kexec_load() blindly iterates over all page_list[] entries,
> regardless of whether any of them is zero, and hence establishes
> (currently on 32-bit only) numerous bogus page table entries mapping
> mfn 0.

Hi Jan,

sorry for not replying earlier. I have CCed Magnus Damm who
along with myself was involved in the original port of kexec to Xen.

I must confess that I am bit rusty on the details, but as I recall the
premise of the original design of our port was to allow the existing kexec
functionality to work inside Xen. Not great deal of thought was given to
to anticipating ways that kexec might change in the future.

In reference to questions 1) and 3), all I can really respond with is that
is how kexec worked at that time (IIRC), so it seemed to be logical for xen
to read the data in that way. If its not now, then I am more than happy to
help work towards a better solution.

In reference to question 2), the number 17 is derived from
include/asm-x86_64/kexec.h in the linux-xen 2.6.18 tree as x86_64 has the
highest value for PAGES_NR.  I agree that having this number included in
the hypercall interface isn't ideal.

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kexec design issues
       [not found]   ` <aec7e5c30903160419g2d6a6386o7f17ec876078133c@mail.gmail.com>
@ 2009-03-16 13:54     ` Jan Beulich
  2009-03-16 20:51       ` Simon Horman
  2009-03-19  9:27       ` Ian Campbell
  0 siblings, 2 replies; 6+ messages in thread
From: Jan Beulich @ 2009-03-16 13:54 UTC (permalink / raw)
  To: Magnus Damm; +Cc: xen-devel, Simon Horman

>>> Magnus Damm <magnus.damm@gmail.com> 16.03.09 12:19 >>>
>So the code assumes that PA_.. are even and VA_.. are odd. The defines
>could have been copied over to make more readable code, yes, or we
>could have shared header files somehow. In the end the hypervisor
>assumes the interface remains unchanged. The addresses are modified so
>the assembly snipped can run from hypervisor address space.
>
>What is the real problem? Just that the interface has changed?

The issue is that the Linux side changed without having any way to
recognize resulting problems apart from running into them. If a lower
software layer gets designed based on implementation details of a higher
layer, there should at least be build time checks that make sure the upper
layer doesn't change in incompatible ways (and yes, a few checks are
in that code, but their coverage is rather limited).

But I view the issue as broader: Any other OS wanting to make use of
the kexec hypercall interface would be required to match the Linux
implementation in  various respects. This is what I consider a design flaw,
which makes me think that the current kexec sub-hypercalls should be
urgently deprecated in favor of a clean interface.

>>> 3) machine_kexec_load() blindly iterates over all page_list[] entries,
>>> regardless of whether any of them is zero, and hence establishes
>>> (currently on 32-bit only) numerous bogus page table entries mapping
>>> mfn 0.
>
>I don't remember if this was how things were in our original
>implementation, or if it is a side effect of upstream linux moving.
>Are the page table mappings bad? They may be a side effect of a shared
>implementation between 32-bit and 64-bit x86..

I don't think they're bad - they're just superfluous and as such latently
dangerous.

>> On Thu, Mar 12, 2009 at 02:49:50PM +0000, Jan Beulich wrote:
>>> 2) KEXEC_XEN_NO_PAGES is set to the apparently arbitrary value of 17.
>>> While a build error would clearly show any incompatible Linux side change
>>> in this case, it still seems bogus to hard-code this Linux defined value into
>>> the hypercall interface.
>
>> In reference to question 2), the number 17 is derived from
>> include/asm-x86_64/kexec.h in the linux-xen 2.6.18 tree as x86_64 has the
>> highest value for PAGES_NR.  I agree that having this number included in
>> the hypercall interface isn't ideal.
>
>I remember that the KEXEC_XEN_NO_PAGES value is related to fixmap().
>We use virtual addresses from the fixmap to map in the pages coming
>from dom0. The fixmap is static by definition. So i think a static
>number comes from there.

But what if x86-64 decides to add PA_SWAP_PAGE (as i386 did in 2.6.27)?
While at least we'd get a build-time error on the kernel side, there will be no
way to get this to work with the current interface, unless they also drop
the middle level page table pointers (as i386 did in 2.6.29). That number
should have been chosen at least to leave some head room, but really
it shouldn't be a fixed number at all (after all, at the time the respective
mappings are needed, almost the entire Xen address space is available,
so there seems little reason to constrain them to the fixmap space).

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kexec design issues
  2009-03-16 13:54     ` Jan Beulich
@ 2009-03-16 20:51       ` Simon Horman
  2009-03-17  8:09         ` Jan Beulich
  2009-03-19  9:27       ` Ian Campbell
  1 sibling, 1 reply; 6+ messages in thread
From: Simon Horman @ 2009-03-16 20:51 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Magnus Damm

On Mon, Mar 16, 2009 at 01:54:34PM +0000, Jan Beulich wrote:
> >>> Magnus Damm <magnus.damm@gmail.com> 16.03.09 12:19 >>>
> >So the code assumes that PA_.. are even and VA_.. are odd. The defines
> >could have been copied over to make more readable code, yes, or we
> >could have shared header files somehow. In the end the hypervisor
> >assumes the interface remains unchanged. The addresses are modified so
> >the assembly snipped can run from hypervisor address space.
> >
> >What is the real problem? Just that the interface has changed?
> 
> The issue is that the Linux side changed without having any way to
> recognize resulting problems apart from running into them. If a lower
> software layer gets designed based on implementation details of a higher
> layer, there should at least be build time checks that make sure the upper
> layer doesn't change in incompatible ways (and yes, a few checks are
> in that code, but their coverage is rather limited).
> 
> But I view the issue as broader: Any other OS wanting to make use of
> the kexec hypercall interface would be required to match the Linux
> implementation in  various respects. This is what I consider a design flaw,
> which makes me think that the current kexec sub-hypercalls should be
> urgently deprecated in favor of a clean interface.

Skipping to the chase, what do you envisage such an interface looking like?

-- 
Simon Horman
  VA Linux Systems Japan K.K., Sydney, Australia Satellite Office
  H: www.vergenet.net/~horms/             W: www.valinux.co.jp/en

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kexec design issues
  2009-03-16 20:51       ` Simon Horman
@ 2009-03-17  8:09         ` Jan Beulich
  0 siblings, 0 replies; 6+ messages in thread
From: Jan Beulich @ 2009-03-17  8:09 UTC (permalink / raw)
  To: Simon Horman; +Cc: xen-devel, Magnus Damm

>>> Simon Horman <horms@verge.net.au> 16.03.09 21:51 >>>
>Skipping to the chase, what do you envisage such an interface looking like?

Something like changing page_list to a pointer (and add a count), and
having an extra attribute (or operation code if you will) field for each page
that tells what to do with it (i.e. as a replacement for the pa/va pairs:
indicating that a certain entry is to represent the virtual address for a
machine address specified by another entry).

Jan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: kexec design issues
  2009-03-16 13:54     ` Jan Beulich
  2009-03-16 20:51       ` Simon Horman
@ 2009-03-19  9:27       ` Ian Campbell
  1 sibling, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2009-03-19  9:27 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel, Magnus Damm, Simon Horman

On Mon, 2009-03-16 at 09:54 -0400, Jan Beulich wrote:
> 
> But I view the issue as broader: Any other OS wanting to make use of
> the kexec hypercall interface would be required to match the Linux
> implementation in  various respects. This is what I consider a design
> flaw, which makes me think that the current kexec sub-hypercalls
> should be urgently deprecated in favor of a clean interface.

I agree with this, the existing interface has always irked me a little
due it its very close ties to the specifics of the Linux 2.6.18 kexec
interface. I've never been entirely sure what a better interface would
look like, but something like you suggest later seems to make sense:

On Tue, 2009-03-17 at 04:09 -0400, Jan Beulich wrote:
> Something like changing page_list to a pointer (and add a count), and
> having an extra attribute (or operation code if you will) field for
> each page that tells what to do with it (i.e. as a replacement for the
> pa/va pairs: indicating that a certain entry is to represent the
> virtual address for a machine address specified by another entry).

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2009-03-19  9:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-12 14:49 kexec design issues Jan Beulich
2009-03-16  5:54 ` Simon Horman
     [not found]   ` <aec7e5c30903160419g2d6a6386o7f17ec876078133c@mail.gmail.com>
2009-03-16 13:54     ` Jan Beulich
2009-03-16 20:51       ` Simon Horman
2009-03-17  8:09         ` Jan Beulich
2009-03-19  9:27       ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.